Macros

We're currently planning an overhaul of the syntax extension and macro systems in Rust. I thought it would be a good opportunity to cover some background on macros and some of the issues with the current system. (Note, that we're not considering anything really radical for the new systems, but hopefully the improvements will be a little bit more than incremental). In this blog post I'd like to talk a bit about macros in general. In later posts I'll try and cover some more Rust-specific things and some areas (like hygiene) in more detail. If you're a Lisp (or Rust macro) expert, this post will probably be very dull.

What are macros?

Macros are a syntactic programming language feature. A macro use is expanded according to a macro definition. Macros usually look somewhat like functions, however, macro expansion happens entirely at compile-time (never at runtime), and usually in the early stages of compilation - sometimes as a preprocessing step (as in C), sometimes after parsing but before further analysis (as in Rust).

Macro expansion is usually a completely syntactic operation. That is, it operates on the program text (or the AST) without knowledge about the meaning of that text (such as type analysis).

At its simplest, macro expansion is textual substitution. For example (in C):

#define FOO 42
int x = FOO;

is expanded by the preprocessor to

int x = 42;

by simply replacing FOO with 42.

Likewise, with arguments, we just substitute the actual arguments into the macro definition, and then the macro into the source:

#define MIN(X, Y)  ((X) < (Y) ? (X) : (Y))
int x = MIN(10, 20);

expands to

int x = ((10) < (20) ? (10) : (20));

After expansion, the expanded program is compiled just like a regular program.

What is macro hygiene?

The naive implementation of macros described above can easily go wrong, for example:

static int a = 42;
#define ADD_A(X)  ((X) + a)

void foo() {
    int a = 0;
    int x = ADD_A(10);
}

You might expect x to be 52 at runtime, but it isn't, it is 10. That's because the expansion is:

static int a = 42;

void foo() {
    int a = 0;
    int x = ((10) + a);
}

There is nothing special about a, it is just a name, so the usual scoping rules apply and we get the a in scope at the macro use site, not the macro definition site as you might expect.

These kind of unexpected results are because C macros are unhygienic. An hygienic macro system (as in Lisp or Rust) would preserve the scoping of the macro definition, so post-expansion, the a from the macro would still refer to the global a rather than the a in foo.

This is the simplest kind of macro hygiene. Once we get into the complexities of hygiene, it turns out there is no great definition. Hygiene applies to variables declared inside a macro (can they be referenced outside it?) as well as applying in some sense to aspects such as privacy. Implementing hygiene gets complex when macro definitions can include macro uses and further macro definitions. To make things even worse, sometimes perfect hygiene is too strong and you want to be able to bend the rules in a (hopefully) safe way.

How can macros be implemented?

Macros can be implemented as simple textual substitution, by manipulating tokens after lexing, or by manipulating the AST after parsing. Conceptually though we simply replace a macro use with the definition, whether the use and definition are represented by text, tokens, or AST nodes. There are some interesting details about exactly how lexing, parsing, and macro expansion interact. But, the most interesting implementation aspect is the algorithm used to maintain hygiene (which I'll cover in a later post).

How macros are implemented also depends on how macros are defined. The simple examples I gave above just substitute the macro definition for the macro use. macro_rules macros in Rust and syntax-rules macros in Scheme allow for pattern matching of arguments in the macro definition, so different code is substituted for the macro use depending on the arguments.

Depending on how macros are defined will affect how and when macros are lexed and parsed. (C macros are not parsed until substitution is completely finished. Rust macros are lexed into tokens before expansion and parsed afterwards).

Procedural macros

The macros described so far simply replace a macro use with macro definition. The macro expander might manipulate the macro definition (to implement hygiene or pattern matching), but the macro definition does not affect the expansion other than providing input. In a procedural macro system, each macro is defined as a program. When a macro use is encountered, the macro is executed (at compile time still) with the macro arguments as input. The macro use is replaced by the result of execution.

For example (using a made up macro language, which should be understandable to Rust programmers. Note though that Rust procedural macros work nothing like this):

proc_macro! foo(x) {
    let mut result = String::new();
    for i in 0..10 {
        result.push_str(x)
    }
    result
}

fn main() {
    let a = "foo!(bar)"; // Hand-waving about string literals.
}

will expand to

fn main() {
    let a = "barbarbarbarbarbarbarbarbarbar";
}

A procedural macro is a generalisation of the syntactic macros described so far. One could imagine implementing a syntactic macro as a procedural macro by returning the text of the syntactic macro after manually substituting the arguments.