RFC 2873: Inline assembly syntax

lang (asm)

Summary

This RFC specifies a new syntax for inline assembly which is suitable for eventual stabilization.

The initial implementation of this feature will focus on the ARM, x86 and RISC-V architectures. Support for more architectures will be added based on user demand.

The transition from the existing asm! macro is described in RFC 2843. The existing asm! macro will be renamed to llvm_asm! to provide an easy way to maintain backwards-compatibility with existing code using inline asm. However llvm_asm! is not intended to ever be stabilized.

Motivation

In systems programming some tasks require dropping down to the assembly level. The primary reasons are for performance, precise timing, and low level hardware access. Using inline assembly for this is sometimes convenient, and sometimes necessary to avoid function call overhead.

The inline assembler syntax currently available in nightly Rust is very ad-hoc. It provides a thin wrapper over the inline assembly syntax available in LLVM IR. For stabilization a more user-friendly syntax that lends itself to implementation across various backends is preferable.

Inline assembly is widely used in the Rust community and is one of the top reasons keeping people on the nightly toolchain. Examples of crates using inline assembly include cortex-m, x86, riscv, parking_lot, libprobe, msp430, etc. A collection of use cases for inline asm can also be found in this repository.

Guide-level explanation

Rust provides support for inline assembly via the asm! macro. It can be used to embed handwritten assembly in the assembly output generated by the compiler. Generally this should not be necessary, but might be where the required performance or timing cannot be otherwise achieved. Accessing low level hardware primitives, e.g. in kernel code, may also demand this functionality.

Note: the examples here are given in x86/x86-64 assembly, but ARM, AArch64 and RISC-V are also supported.

Basic usage

Let us start with the simplest possible example:

unsafe {
    asm!("nop");
}

This will insert a NOP (no operation) instruction into the assembly generated by the compiler. Note that all asm! invocations have to be inside an unsafe block, as they could insert arbitrary instructions and break various invariants. The instructions to be inserted are listed in the first argument of the asm! macro as a string literal.

Inputs and outputs

Now inserting an instruction that does nothing is rather boring. Let us do something that actually acts on data:

let x: u64;
unsafe {
    asm!("mov {}, 5", out(reg) x);
}
assert_eq!(x, 5);

This will write the value 5 into the u64 variable x. You can see that the string literal we use to specify instructions is actually a template string. It is governed by the same rules as Rust format strings. The arguments that are inserted into the template however look a bit different then you may be familiar with. First we need to specify if the variable is an input or an output of the inline assembly. In this case it is an output. We declared this by writing out. We also need to specify in what kind of register the assembly expects the variable. In this case we put it in an arbitrary general purpose register by specifying reg. The compiler will choose an appropriate register to insert into the template and will read the variable from there after the inline assembly finishes executing.

Let us see another example that also uses an input:

let i: u64 = 3;
let o: u64;
unsafe {
    asm!(
        "mov {0}, {1}",
        "add {0}, {number}",
        out(reg) o,
        in(reg) i,
        number = const 5,
    );
}
assert_eq!(o, 8);

This will add 5 to the input in variable i and write the result to variable o. The particular way this assembly does this is first copying the value from i to the output, and then adding 5 to it.

The example shows a few things:

First, we can see that asm! allows multiple template string arguments; each one is treated as a separate line of assembly code, as if they were all joined together with newlines between them. This makes it easy to format assembly code.

Second, we can see that inputs are declared by writing in instead of out.

Third, one of our operands has a type we haven't seen yet, const. This tells the compiler to expand this argument to value directly inside the assembly template. This is only possible for constants and literals.

Fourth, we can see that we can specify an argument number, or name as in any format string. For inline assembly templates this is particularly useful as arguments are often used more than once. For more complex inline assembly using this facility is generally recommended, as it improves readability, and allows reordering instructions without changing the argument order.

We can further refine the above example to avoid the mov instruction:

let mut x: u64 = 3;
unsafe {
    asm!("add {0}, {number}", inout(reg) x, number = const 5);
}
assert_eq!(x, 8);

We can see that inout is used to specify an argument that is both input and output. This is different from specifying an input and output separately in that it is guaranteed to assign both to the same register.

It is also possible to specify different variables for the input and output parts of an inout operand:

let x: u64 = 3;
let y: u64;
unsafe {
    asm!("add {0}, {number}", inout(reg) x => y, number = const 5);
}
assert_eq!(y, 8);

Late output operands

The Rust compiler is conservative with its allocation of operands. It is assumed that an out can be written at any time, and can therefore not share its location with any other argument. However, to guarantee optimal performance it is important to use as few registers as possible, so they won't have to be saved and reloaded around the inline assembly block. To achieve this Rust provides a lateout specifier. This can be used on any output that is written only after all inputs have been consumed. There is also a inlateout variant of this specifier.

Here is an example where inlateout cannot be used:

let mut a: u64 = 4;
let b: u64 = 4;
let c: u64 = 4;
unsafe {
    asm!(
        "add {0}, {1}",
        "add {0}, {2}",
        inout(reg) a,
        in(reg) b,
        in(reg) c,
    );
}
assert_eq!(a, 12);

Here the compiler is free to allocate the same register for inputs b and c since it knows they have the same value. However it must allocate a separate register for a since it uses inout and not inlateout. If inlateout was used, then a and c could be allocated to the same register, in which case the first instruction to overwrite the value of c and cause the assembly code to produce the wrong result.

However the following example can use inlateout since the output is only modified after all input registers have been read:

let mut a: u64 = 4;
let b: u64 = 4;
unsafe {
    asm!("add {0}, {1}", inlateout(reg) a, in(reg) b);
}
assert_eq!(a, 8);

As you can see, this assembly fragment will still work correctly if a and b are assigned to the same register.

Explicit register operands

Some instructions require that the operands be in a specific register. Therefore, Rust inline assembly provides some more specific constraint specifiers. While reg is generally available on any architecture, these are highly architecture specific. E.g. for x86 the general purpose registers eax, ebx, ecx, edx, ebp, esi, and edi among others can be addressed by their name.

let cmd = 0xd1;
unsafe {
    asm!("out 0x64, eax", in("eax") cmd);
}

In this example we call the out instruction to output the content of the cmd variable to port 0x64. Since the out instruction only accepts eax (and its sub registers) as operand we had to use the eax constraint specifier.

Note that unlike other operand types, explicit register operands cannot be used in the template string: you can't use {} and should write the register name directly instead. Also, they must appear at the end of the operand list after all other operand types.

Consider this example which uses the x86 mul instruction:

fn mul(a: u64, b: u64) -> u128 {
    let lo: u64;
    let hi: u64;

    unsafe {
        asm!(
            // The x86 mul instruction takes rax as an implicit input and writes
            // the 128-bit result of the multiplication to rax:rdx.
            "mul {}",
            in(reg) a,
            inlateout("rax") b => lo,
            lateout("rdx") hi,
        );
    }

    ((hi as u128) << 64) + lo as u128
}

This uses the mul instruction to multiply two 64-bit inputs with a 128-bit result. The only explicit operand is a register, that we fill from the variable a. The second operand is implicit, and must be the rax register, which we fill from the variable b. The lower 64 bits of the result are stored in rax from which we fill the variable lo. The higher 64 bits are stored in rdx from which we fill the variable hi.

Clobbered registers

In many cases inline assembly will modify state that is not needed as an output. Usually this is either because we have to use a scratch register in the assembly, or instructions modify state that we don't need to further examine. This state is generally referred to as being "clobbered". We need to tell the compiler about this since it may need to save and restore this state around the inline assembly block.

let ebx: u32;
let ecx: u32;

unsafe {
    asm!(
        "cpuid",
        // EAX 4 selects the "Deterministic Cache Parameters" CPUID leaf
        inout("eax") 4 => _,
        // ECX 0 selects the L0 cache information.
        inout("ecx") 0 => ecx,
        lateout("ebx") ebx,
        lateout("edx") _,
    );
}

println!(
    "L1 Cache: {}",
    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1) * ((ebx & 0xfff) + 1) * (ecx + 1)
);

In the example above we use the cpuid instruction to get the L1 cache size. This instruction writes to eax, ebx, ecx, and edx, but for the cache size we only care about the contents of ebx and ecx.

However we still need to tell the compiler that eax and edx have been modified so that it can save any values that were in these registers before the asm. This is done by declaring these as outputs but with _ instead of a variable name, which indicates that the output value is to be discarded.

This can also be used with a general register class (e.g. reg) to obtain a scratch register for use inside the asm code:

// Multiply x by 6 using shifts and adds
let mut x: u64 = 4;
unsafe {
    asm!(
        "mov {tmp}, {x}",
        "shl {tmp}, 1",
        "shl {x}, 2",
        "add {x}, {tmp}",
        x = inout(reg) x,
        tmp = out(reg) _,
    );
}
assert_eq!(x, 4 * 6);

Symbol operands

A special operand type, sym, allows you to use the symbol name of a fn or static in inline assembly code. This allows you to call a function or access a global variable without needing to keep its address in a register.

extern "C" fn foo(arg: i32) {
    println!("arg = {}", arg);
}

fn call_foo(arg: i32) {
    unsafe {
        asm!(
            "call {}",
            sym foo,
            // 1st argument in rdi, which is caller-saved
            inout("rdi") arg => _,
            // All caller-saved registers must be marked as clobberred
            out("rax") _, out("rcx") _, out("rdx") _, out("rsi") _,
            out("r8") _, out("r9") _, out("r10") _, out("r11") _,
            out("xmm0") _, out("xmm1") _, out("xmm2") _, out("xmm3") _,
            out("xmm4") _, out("xmm5") _, out("xmm6") _, out("xmm7") _,
            out("xmm8") _, out("xmm9") _, out("xmm10") _, out("xmm11") _,
            out("xmm12") _, out("xmm13") _, out("xmm14") _, out("xmm15") _,
        )
    }
}

Note that the fn or static item does not need to be public or #[no_mangle]: the compiler will automatically insert the appropriate mangled symbol name into the assembly code.

Register template modifiers

In some cases, fine control is needed over the way a register name is formatted when inserted into the template string. This is needed when an architecture's assembly language has several names for the same register, each typically being a "view" over a subset of the register (e.g. the low 32 bits of a 64-bit register).

By default the compiler will always choose the name that refers to the full register size (e.g. rax on x86-64, eax on x86, etc).

This default can be overriden by using modifiers on the template string operands, just like you would with format strings:

let mut x: u16 = 0xab;

unsafe {
    asm!("mov {0:h}, {0:l}", inout(reg_abcd) x);
}

assert_eq!(x, 0xabab);

In this example, we use the reg_abcd register class to restrict the register allocator to the 4 legacy x86 register (ax, bx, cx, dx) of which the first two bytes can be addressed independently.

Let us assume that the register allocator has chosen to allocate x in the ax register. The h modifier will emit the register name for the high byte of that register and the l modifier will emit the register name for the low byte. The asm code will therefore be expanded as mov ah, al which copies the low byte of the value into the high byte.

If you use a smaller data type (e.g. u16) with an operand and forget the use template modifiers, the compiler will emit a warning and suggest the correct modifier to use.

Options

By default, an inline assembly block is treated the same way as an external FFI function call with a custom calling convention: it may read/write memory, have observable side effects, etc. However in many cases, it is desirable to give the compiler more information about what the assembly code is actually doing so that it can optimize better.

Let's take our previous example of an add instruction:

let mut a: u64 = 4;
let b: u64 = 4;
unsafe {
    asm!(
        "add {0}, {1}",
        inlateout(reg) a, in(reg) b,
        options(pure, nomem, nostack)
    );
}
assert_eq!(a, 8);

Options can be provided as an optional final argument to the asm! macro. We specified three options here:

These allow the compiler to better optimize code using asm!, for example by eliminating pure asm! blocks whose outputs are not needed.

See the reference for the full list of available options and their effects.

Reference-level explanation

Inline assembler is implemented as an unsafe macro asm!(). The first argument to this macro is a template string literal used to build the final assembly. Additional template string literal arguments may be provided; all of the template string arguments are interpreted as if concatenated into a single template string with \n between them. The following arguments specify input and output operands. When required, options are specified as the final argument.

The following ABNF specifies the general syntax:

dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout"
reg_spec := <register class> / "<explicit register>"
operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_"
reg_operand := dir_spec "(" reg_spec ")" operand_expr
operand := reg_operand / "const" const_expr / "sym" path
option := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn" / "nostack" / "att_syntax"
options := "options(" option *["," option] [","] ")"
asm := "asm!(" format_string *("," format_string) *("," [ident "="] operand) ["," options] [","] ")"

The macro will initially be supported only on ARM, AArch64, x86, x86-64 and RISC-V targets. Support for more targets may be added in the future. The compiler will emit an error if asm! is used on an unsupported target.

Template string arguments

The assembler template uses the same syntax as format strings (i.e. placeholders are specified by curly braces). The corresponding arguments are accessed in order, by index, or by name. However, implicit named arguments (introduced by RFC #2795) are not supported.

An asm! invocation may have one or more template string arguments; an asm! with multiple template string arguments is treated as if all the strings were concatenated with a \n between them. The expected usage is for each template string argument to correspond to a line of assembly code. All template string arguments must appear before any other arguments.

As with format strings, named arguments must appear after positional arguments. Explicit register operands must appear at the end of the operand list, after named arguments if any.

Explicit register operands cannot be used by placeholders in the template string. All other named and positional operands must appear at least once in the template string, otherwise a compiler error is generated.

The exact assembly code syntax is target-specific and opaque to the compiler except for the way operands are substituted into the template string to form the code passed to the assembler.

The 4 targets specified in this RFC (x86, ARM, AArch64, RISC-V) all use the assembly code syntax of the GNU assembler (GAS). On x86, the .intel_syntax noprefix mode of GAS is used by default. On ARM, the .syntax unified mode is used. These targets impose an additional restriction on the assembly code: any assembler state (e.g. the current section which can be changed with .section) must be restored to its original value at the end of the asm string. Assembly code that does not conform to the GAS syntax will result in assembler-specific behavior.

Operand type

Several types of operands are supported:

Operand expressions are evaluated from left to right, just like function call arguments. After the asm! has executed, outputs are written to in left to right order. This is significant if two outputs point to the same place: that place will contain the value of the rightmost output.

Register operands

Input and output operands can be specified either as an explicit register or as a register class from which the register allocator can select a register. Explicit registers are specified as string literals (e.g. "eax") while register classes are specified as identifiers (e.g. reg). Using string literals for register names enables support for architectures that use special characters in register names, such as MIPS ($0, $1, etc).

Note that explicit registers treat register aliases (e.g. r14 vs lr on ARM) and smaller views of a register (e.g. eax vs rax) as equivalent to the base register. It is a compile-time error to use the same explicit register for two input operands or two output operands. Additionally, it is also a compile-time error to use overlapping registers (e.g. ARM VFP) in input operands or in output operands.

Only the following types are allowed as operands for inline assembly:

Here is the list of currently supported register classes:

ArchitectureRegister classRegistersLLVM constraint code
x86regax, bx, cx, dx, si, di, r[8-15] (x86-64 only)r
x86reg_abcdax, bx, cx, dxQ
x86-32reg_byteal, bl, cl, dl, ah, bh, ch, dhq
x86-64reg_byteal, bl, cl, dl, sil, dil, r[8-15]b, ah*, bh*, ch*, dh*q
x86xmm_regxmm[0-7] (x86) xmm[0-15] (x86-64)x
x86ymm_regymm[0-7] (x86) ymm[0-15] (x86-64)x
x86zmm_regzmm[0-7] (x86) zmm[0-31] (x86-64)v
x86kregk[1-7]Yk
AArch64regx[0-28], x30r
AArch64vregv[0-31]w
AArch64vreg_low16v[0-15]x
ARMregr[0-r10], r12, r14r
ARM (Thumb)reg_thumbr[0-r7]l
ARM (ARM)reg_thumbr[0-r10], r12, r14l
ARMsregs[0-31]t
ARMsreg_low16s[0-15]x
ARMdregd[0-31]w
ARMdreg_low16d[0-15]t
ARMdreg_low8d[0-8]x
ARMqregq[0-15]w
ARMqreg_low8q[0-7]t
ARMqreg_low4q[0-3]x
RISC-Vregx1, x[5-7], x[9-15], x[16-31] (non-RV32E)r
RISC-Vfregf[0-31]f

Note: On x86 we treat reg_byte differently from reg (and reg_abcd) because the compiler can allocate al and ah separately whereas reg reserves the whole register.

Note #2: On x86-64 the high byte registers (e.g. ah) are only available when used as an explicit register. Specifying the reg_byte register class for an operand will always allocate a low byte register.

Additional register classes may be added in the future based on demand (e.g. MMX, x87, etc).

Each register class has constraints on which value types they can be used with. This is necessary because the way a value is loaded into a register depends on its type. For example, on big-endian systems, loading a i32x4 and a i8x16 into a SIMD register may result in different register contents even if the byte-wise memory representation of both values is identical. The availability of supported types for a particular register class may depend on what target features are currently enabled.

ArchitectureRegister classTarget featureAllowed types
x86-32regNonei16, i32, f32
x86-64regNonei16, i32, f32, i64, f64
x86reg_byteNonei8
x86xmm_regssei32, f32, i64, f64,
i8x16, i16x8, i32x4, i64x2, f32x4, f64x2
x86ymm_regavxi32, f32, i64, f64,
i8x16, i16x8, i32x4, i64x2, f32x4, f64x2
i8x32, i16x16, i32x8, i64x4, f32x8, f64x4
x86zmm_regavx512fi32, f32, i64, f64,
i8x16, i16x8, i32x4, i64x2, f32x4, f64x2
i8x32, i16x16, i32x8, i64x4, f32x8, f64x4
i8x64, i16x32, i32x16, i64x8, f32x16, f64x8
x86kregaxv512fi8, i16
x86kregaxv512bwi32, i64
AArch64regNonei8, i16, i32, f32, i64, f64
AArch64vregfpi8, i16, i32, f32, i64, f64,
i8x8, i16x4, i32x2, i64x1, f32x2, f64x1,
i8x16, i16x8, i32x4, i64x2, f32x4, f64x2
ARMregNonei8, i16, i32, f32
ARMsregvfp2i32, f32
ARMdregvfp2i64, f64, i8x8, i16x4, i32x2, i64x1, f32x2
ARMqregneoni8x16, i16x8, i32x4, i64x2, f32x4
RISC-V32regNonei8, i16, i32, f32
RISC-V64regNonei8, i16, i32, f32, i64, f64
RISC-Vfregff32
RISC-Vfregdf64

Note: For the purposes of the above table, unsigned types uN, isize, pointers and function pointers are treated as the equivalent integer type (i16/i32/i64 depending on the target).

Note #2: Registers not listed in the table above cannot be used as operands for inline assembly.

If a value is of a smaller size than the register it is allocated in then the upper bits of that register will have an undefined value for inputs and will be ignored for outputs. The only exception is the freg register class on RISC-V where f32 values are NaN-boxed in a f64 as required by the RISC-V architecture.

When separate input and output expressions are specified for an inout operand, both expressions must have the same type. The only exception is if both operands are pointers or integers, in which case they are only required to have the same size. This restriction exists because the register allocators in LLVM and GCC sometimes cannot handle tied operands with different types.

Register names

Some registers have multiple names. These are all treated by the compiler as identical to the base register name. Here is the list of all supported register aliases:

ArchitectureBase registerAliases
x86axeax, rax
x86bxebx, rbx
x86cxecx, rcx
x86dxedx, rdx
x86siesi, rsi
x86diedi, rdi
x86bpbpl, ebp, rbp
x86spspl, esp, rsp
x86ipeip, rip
x86st(0)st
x86r[8-15]r[8-15]b, r[8-15]w, r[8-15]d
x86xmm[0-31]ymm[0-31], zmm[0-31]
AArch64x[0-30]w[0-30]
AArch64x29fp
AArch64x30lr
AArch64spwsp
AArch64xzrwzr
AArch64v[0-31]b[0-31], h[0-31], s[0-31], d[0-31], q[0-31]
ARMr[0-3]a[1-4]
ARMr[4-9]v[1-6]
ARMr9rfp
ARMr10sl
ARMr11fp
ARMr12ip
ARMr13sp
ARMr14lr
ARMr15pc
RISC-Vx0zero
RISC-Vx1ra
RISC-Vx2sp
RISC-Vx3gp
RISC-Vx4tp
RISC-Vx[5-7]t[0-2]
RISC-Vx8fp, s0
RISC-Vx9s1
RISC-Vx[10-17]a[0-7]
RISC-Vx[18-27]s[2-11]
RISC-Vx[28-31]t[3-6]
RISC-Vf[0-7]ft[0-7]
RISC-Vf[8-9]fs[0-1]
RISC-Vf[10-17]fa[0-7]
RISC-Vf[18-27]fs[2-11]
RISC-Vf[28-31]ft[8-11]

Note: This table includes registers which are not usable as operands. They are listed here purely for the purposes of compiler diagnostics.

Registers not listed in the table of register classes cannot be used as operands for inline assembly. This includes the following registers:

ArchitectureUnsupported registerReason
AllspThe stack pointer must be restored to its original value at the end of an asm code block.
Allbp (x86), r11 (ARM), x29 (AArch64), x8 (RISC-V)The frame pointer cannot be used as an input or output.
x86k0This is a constant zero register which can't be modified.
x86ipThis is the program counter, not a real register.
x86mm[0-7]MMX registers are not currently supported (but may be in the future).
x86st([0-7])x87 registers are not currently supported (but may be in the future).
AArch64xzrThis is a constant zero register which can't be modified.
ARMpcThis is the program counter, not a real register.
RISC-Vx0This is a constant zero register which can't be modified.
RISC-Vgp, tpThese registers are reserved and cannot be used as inputs or outputs.

Template modifiers

The placeholders can be augmented by modifiers which are specified after the : in the curly braces. These modifiers do not affect register allocation, but change the way operands are formatted when inserted into the template string. Only one modifier is allowed per template placeholder.

The supported modifiers are a subset of LLVM's (and GCC's) asm template argument modifiers, but do not use the same letter codes.

ArchitectureRegister classModifierExample outputLLVM modifier
x86-32regNoneeaxk
x86-64regNoneraxq
x86-64reglalb
x86regxaxw
x86regeeaxk
x86-64regrraxq
x86-32reg_abcdNoneeaxk
x86-64reg_abcdNoneraxq
x86reg_abcdlalb
x86reg_abcdhahh
x86reg_abcdxaxw
x86reg_abcdeeaxk
x86-64reg_abcdrraxq
x86reg_byteNoneal / ahNone
x86xmm_regNonexmm0x
x86ymm_regNoneymm0t
x86zmm_regNonezmm0g
x86*mm_regxxmm0x
x86*mm_regyymm0t
x86*mm_regzzmm0g
x86kregNonek1None
AArch64regNonex0x
AArch64regww0w
AArch64regxx0x
AArch64vregNonev0None
AArch64vregvv0None
AArch64vregbb0b
AArch64vreghh0h
AArch64vregss0s
AArch64vregdd0d
AArch64vregqq0q
ARMregNoner0None
ARMsregNones0None
ARMdregNoned0P
ARMqregNoneq0q
ARMqrege / fd0 / d1e / f
RISC-VregNonex1None
RISC-VfregNonef0None

Notes:

  • on ARM and AArch64, the *_low register classes have the same modifiers as their base register class.
  • on ARM e / f: this prints the low or high doubleword register name of a NEON quad (128-bit) register.
  • on x86: our behavior for reg with no modifiers differs from what GCC does. GCC will infer the modifier based on the operand value type, while we default to the full register size.
  • on x86 xmm_reg: the x, t and g LLVM modifiers are not yet implemented in LLVM (they are supported by GCC only), but this should be a simple change.

As stated in the previous section, passing an input value smaller than the register width will result in the upper bits of the register containing undefined values. This is not a problem if the inline asm only accesses the lower bits of the register, which can be done by using a template modifier to use a subregister name in the asm code (e.g. ax instead of rax). Since this an easy pitfall, the compiler will suggest a template modifier to use where appropriate given the input type. If all references to an operand already have modifiers then the warning is suppressed for that operand.

Options

Flags are used to further influence the behavior of the inline assembly block. Currently the following options are defined:

The compiler performs some additional checks on options:

Mapping to LLVM IR

The direction specification maps to a LLVM constraint specification as follows (using a reg operand as an example):

If an inout is used where the output type is smaller than the input type then some special handling is needed to avoid LLVM issues. See this bug.

As written this RFC requires architectures to map from Rust constraint specifications to LLVM constraint codes. This is in part for better readability on Rust's side and in part for independence of the backend:

Additionally, the following attributes are added to the LLVM asm statement:

If the noreturn option is set then an unreachable LLVM instruction is inserted after the asm invocation.

Note that alignstack is not currently supported by GCC, so we will need to implement support in GCC if Rust ever gets a GCC back-end.

Supporting back-ends without inline assembly

While LLVM supports inline assembly, rustc may gain alternative backends such as Cranelift or GCC. If a back-end does not support inline assembly natively then we can fall back to invoking an external assembler. The intent is that support for asm! should be independent of the rustc back-end used: it should always work, but with lower performance if the backend does not support inline assembly.

Take the following (AArch64) asm block as an example:

unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
{
    let c;
    asm!("<some asm code>", inout(reg) a, in("x0") b, out("x20") c);
    (a, c)
}

This could be expanded to an external asm file with the following contents:

# Function prefix directives
.section ".text.foo_inline_asm"
.globl foo_inline_asm
.p2align 2
.type foo_inline_asm, @function
foo_inline_asm:

// If necessary, save callee-saved registers to the stack here.
str x20, [sp, #-16]!

// Move the pointer to the argument out of the way since x0 is used.
mov x1, x0

// Load inputs values
ldr w2, [x1, #0]
ldr w0, [x1, #4]

<some asm code>

// Store output values
str w2, [x1, #0]
str w20, [x1, #8]

// If necessary, restore callee-saved registers here.
ldr x20, [sp], #16

ret

# Function suffix directives
.size foo_inline_asm, . - foo_inline_asm

And the following Rust code:

unsafe fn foo(mut a: i32, b: i32) -> (i32, i32)
{
    let c;
    {
        #[repr(C)]
        struct foo_inline_asm_args {
            a: i32,
            b: i32,
            c: i32,
        }
        extern "C" {
            fn foo_inline_asm(args: *mut foo_inline_asm_args);
        }
        let mut args = foo_inline_asm_args {
            a: a,
            b: b,
            c: mem::uninitialized(),
        };
        foo_inline_asm(&mut args);
        a = args.a;
        c = args.c;
    }
    (a, c)
}

Rules for inline assembly

Note: As a general rule, the flags covered by preserves_flags are those which are not preserved when performing a function call.

Drawbacks

Unfamiliarity

This RFC proposes a completely new inline assembly format. It is not possible to just copy examples of GCC-style inline assembly and re-use them. There is however a fairly trivial mapping between the GCC-style and this format that could be documented to alleviate this.

Additionally, this RFC proposes using the Intel asm syntax by default on x86 instead of the AT&T syntax. We believe this syntax will be more familiar to most users, but may be surprising for users used to GCC-style asm.

The cpuid example above would look like this in GCC-sytle inline assembly:

// GCC doesn't allow directly clobbering an input, we need
// to use a dummy output instead.
int ebx, ecx, discard;
asm (
    "cpuid"
    : "=a"(discard), "=b"(ebx), "=c"(ecx) // outputs
    : "a"(4), "c"(0) // inputs
    : "edx" // clobbers
);
printf("L1 Cache: %i\n", ((ebx >> 22) + 1)
    * (((ebx >> 12) & 0x3ff) + 1)
    * ((ebx & 0xfff) + 1)
    * (ecx + 1));

Limited set of operand types

The proposed set of operand types is much smaller than that which is available through GCC-style inline assembly. In particular, the proposed syntax does not include any form of memory operands and is missing many register classes.

We chose to keep operand constraints as simple as possible, and in particular memory operands introduce a lot of complexity since different instruction support different addressing modes. At the same time, the exact rules for memory operands are not very well known (you are only allowed to access the data directly pointed to by the constraint) and are often gotten wrong.

If we discover that there is a demand for a new register class or special operand type, we can always add it later.

Difficulty of support

Inline assembly is a difficult feature to implement in a compiler backend. While LLVM does support it, this may not be the case for alternative backends such as Cranelift (see this issue). We provide a fallback implementation using an external assembler for such backends.

Use of double braces in the template string

Because {} are used to denote operand placeholders in the template string, actual uses of braces in the assembly code need to be escaped with {{ and }}. This is needed for AVX-512 mask registers and ARM register lists.

Post-monomorphization errors

Since the code generated by asm! is only evaluated late in the compiler back-end, errors in the assembly code (e.g. invalid syntax, unrecognized instruction, etc) are reported during code generation unlike every other error generated by rustc. In particular this means that:

However there is a precedent in Rust for post-monomorphization errors: linker errors. Code which references a non-existent extern symbol will only cause an error at link-time, and this can also vary with optimization levels as dead code elimination may removed the reference to the symbol before it reaches the linker.

Rationale and alternatives

Implement an embedded DSL

Both MSVC and D provide what is best described as an embedded DSL for inline assembly. It is generally close to the system assembler's syntax, but augmented with the ability to directly access variables that are in scope.

// This is D code
int ebx, ecx;
asm {
    mov EAX, 4;
    xor ECX, ECX;
    cpuid;
    mov ebx, EBX;
    mov ecx, ECX;
}
writefln("L1 Cache: %s",
    ((ebx >> 22) + 1) * (((ebx >> 12) & 0x3ff) + 1)
    * ((ebx & 0xfff) + 1) * (ecx + 1));
// This is MSVC C++
int ebx_v, ecx_v;
__asm {
    mov eax, 4
    xor ecx, ecx
    cpuid
    mov ebx_v, ebx
    mov ecx_v, ecx
}
std::cout << "L1 Cache: "
    << ((ebx_v >> 22) + 1) * (((ebx_v >> 12) & 0x3ff) + 1)
        * ((ebx_v & 0xfff) + 1) * (ecx_v + 1))
    << '\n';

While this is very convenient on the user side in that it requires no specification of inputs, outputs, or clobbers, it puts a major burden on the implementation. The DSL needs to be implemented for each supported architecture, and full knowledge of the side-effect of every instruction is required.

This huge implementation overhead is likely one of the reasons MSVC only provides this capability for x86, while D at least provides it for x86 and x86-64. It should also be noted that the D reference implementation falls slightly short of supporting arbitrary assembly. E.g. the lack of access to the RIP register makes certain techniques for writing position independent code impossible.

As a stop-gap the LDC implementation of D provides a llvmasm feature that binds it closely to LLVM IR's inline assembly.

We believe it would be unfortunate to put Rust into a similar situation, making certain architectures a second-class citizen with respect to inline assembly.

Provide intrinsics for each instruction

In discussions it is often postulated that providing intrinsics is a better solution to the problems at hand. However, particularly where precise timing, and full control over the number of generated instructions is required intrinsics fall short.

Intrinsics are of course still useful and have their place for inserting specific instructions. E.g. making sure a loop uses vector instructions, rather than relying on auto-vectorization.

However, inline assembly is specifically designed for cases where more control is required. Also providing an intrinsic for every (potentially obscure) instruction that is needed e.g. during early system boot in kernel code is unlikely to scale.

Make the asm! macro return outputs

It has been suggested that the asm! macro could return its outputs like the LLVM statement does. The benefit is that it is clearer to see that variables are being modified. Particular in the case of initialization it becomes more obvious what is happening. On the other hand by necessity this splits the direction and constraint specification from the variable name, which makes this syntax overall harder to read.

fn mul(a: u64, b: u64) -> u128 {
    let (lo, hi): (u64, u64) = unsafe {
        asm!("mul {}", in(reg) a, in("rax") b, lateout("rax"), lateout("rdx"))
    };

    hi as u128 << 64 + lo as u128
}

Use AT&T syntax by default on x86

x86 is particular in that there are two widely used dialects for its assembly code: Intel syntax, which is the official syntax for x86 assembly, and AT&T syntax which is used by GCC (via GAS). There is no functional difference between those two dialects, they both support the same functionality but with a different syntax. This RFC chooses to use Intel syntax by default since it is more widely used and users generally find it easier to read and write.

Validate the assembly code in rustc

There may be some slight differences in the set of assembly code that is accepted by different compiler back-ends (e.g. LLVM's integrated assembler vs using GAS as an external assembler). Examples of such differences are:

While it might be possible for rustc to verify that inline assembly code conforms to a minimal stable subset of the assembly syntax supported by LLVM and GAS, doing so would effectively require rustc to parse the assembly code itself. Implementing a full assembler for all target architectures supported by this RFC is a huge amount of work, most of which is redundant with the work that LLVM has already done in implementing an assembler. As such, this RFC does not propose that rustc perform any validation of the generated assembly code.

Include the target architecture name in asm!

Including the name of the target architecture as part of the asm! invocation could allow IDEs to perform syntax highlighting on the assembly code. However this has several downsides:

Operands before template string

The operands could be placed before the template string, which could make the asm easier to read in some cases. However we decided against it because the benefits are small and the syntax would no longer mirror that of Rust format string.

Operands interleaved with template string arguments

An asm directive could contain a series of template string arguments, each followed by the operands referenced in that template string argument. This could potentially simplify long blocks of assembly. However, this could introduce significant complexity and difficulty of reading, due to the numbering of positional arguments, and the possibility of referencing named or numbered arguments other than those that appear grouped with a given template string argument.

Experimentation with such mechanisms could take place in wrapper macros around asm!, rather than in asm! itself.

Prior art

GCC inline assembly

The proposed syntax is very similar to GCC's inline assembly in that it is based on string substitution while leaving actual interpretation of the final string to the assembler. However GCC uses poorly documented single-letter constraint codes and template modifiers. Clang tries to emulate GCC's behavior, but there are still several cases where its behavior differs from GCC's.

The main reason why this is so complicated is that GCC's inline assembly basically exports the raw internals of GCC's register allocator. This has resulted in many internal constraint codes and modifiers being widely used, despite them being completely undocumented.

D & MSVC inline assembly

See the section above.

Unresolved questions

Namespacing the asm! macro

Should the asm! macro be available directly from the prelude as it is now, or should it have to be imported from std::arch::$ARCH::asm? The advantage of the latter is that it would make it explicit that the asm! macro is target-specific, but it would make cross-platform code slightly longer to write.

Future possibilities

Flag outputs

GCC supports a special type of output which allows an asm block to return a bool encoded in the condition flags register. This allows the compiler to branch directly on the condition flag instead of materializing the condition as a bool.

We can support this in the future with a special output operand type.

asm goto

GCC supports passing C labels (the ones used with goto) to an inline asm block, with an indication that the asm code may jump directly to one of these labels instead of leaving the asm block normally.

This could be supported by allowing code blocks to be specified as operand types. The following code will print a if the input value is 42, or print b otherwise.

asm!(
    "cmp {}, 42",
    "jeq {}",
    in(reg) val,
    label { println!("a"); },
    fallthrough { println!("b"); }
);

Unique ID per asm

GCC supports %= which generates a unique identifier per instance of an asm block. This is guaranteed to be unique even if the asm block is duplicated (e.g. because of inlining).

We can support this in the future with a special operand type.

const and sym for global_asm!

The global_asm! macro could be extended to support const and sym operands since those can be resolved by simple string substitution. Symbols used in global_asm! will be marked as #[used] to ensure that they are not optimized away by the compiler.

Memory operands

We could support mem as an alternative to specifying a register class which would leave the operand in memory and instead produce a memory address when inserted into the asm string. This would allow generating more efficient code by taking advantage of addressing modes instead of using an intermediate register to hold the computed address.

Shorthand notation for operand names

We should support some sort of shorthand notation for operand names to avoid needing to write blah = out(reg) blah? For example, if the expression is just a single identifier, we could implicitly allow that operand to be referred to using that identifier.

Clobbers for function calls

Sometimes it can be difficult to specify the necessary clobbers for an asm block which performs a function call. In particular, it is difficult for such code to be forward-compatible if the architecture adds new registers in a future revision, which the compiler may use but will be missing from the asm! clobber list.

One possible solution to this would be to add a clobber(<abi>) operand where <abi> is a calling convention such as "C" or "stdcall". The compiler would then automatically insert the necessary clobbers for a function call to that ABI. Also clobber(all), could be used to indicate all registers are clobbered by the asm!.