Macros (and syntax extensions and compiler plugins) - where are we at?

Procedural macros are one of the main reasons Rust programmers use nightly rather than stable Rust, and one of the few areas still causing breaking changes. Recently, part of the story around procedural macros has been coming together and here I'll explain what you can do today, and where we're going in the future.

TL;DR: as a procedural macro author, you're now able to write custom derive implementations which are on a fast track to stabilisation, and to experiment with the beginnings of our long-term plan for general purpose procedural macros. As a user of procedural macros, you'll soon be saying goodbye to bustage in procedural macro libraries caused by changes to compiler internals.

Macros today

Macros are an important part of Rust. They facilitate convenient and safe functionality used by all Rust programmers, such as println! and assert!; they reduce boilerplate, and make implementing traits trivial via derive. They also allow libraries to provide interesting and unusual abstractions.

However, macros are a rough corner - declarative macros (macro_rules macros) have their own system for modularisation, a fiddly syntax for declarations, and some odd rules around hygiene. Procedural macros (aka syntax extensions, compiler plugins) are unstable and painful to use. Despite that, they are used to implement some core parts of the ecosystem, including serialisation, and this causes a great deal of friction for Rust users who have to use nightly Rust or clunky build systems, and either way get hit with regular upstream breakage.

Future of procedural macros

We strongly want to improve this situation. Our current priority is procedural macros, and in particular the parts of the procedural macro system which force Rust users onto nightly or cause recurring upstream errors.

Our goal is an expressive, powerful, and well-designed system that is as stable as the rest of the language. Design work is ongoing in the RFC process. We have accepted RFCs on naming and custom derive/macros 1.1, there are open RFCs on the overall design of procedural macros, attributes, etc., and probably several more to come, in particular about the libraries available to macro authors.

The future, today!

One of the core innovations to the procedural macro system is to base our macros on tokens rather than AST nodes. The AST is a compiler-internal data structure; it must change whenever we add new syntax to the compiler, and often changes even when we don't due to refactoring, etc. That means that macros based on the AST break whenever the compiler changes, i.e., with every new version. In contrast, tokens are mostly stable, and even when they must change, that change can easily be abstracted over.

We have begun the implementation of token-based macros and today you can experiment with them in two ways: by writing custom derive implementations using macros 1.1, and within the existing syntax extension framework. At the moment these two features are quite different, but as part of the stabilisation process they should become more similar to use, and share more of their implementations.

Even better for many users, popular macro-based libraries such as Serde are moving to the new macro system, and crates using these libraries should see fewer errors due to changes to the compiler. Soon, users should be able to use these libraries from stable Rust.

Custom derive (macros 1.1)

The derive attribute lets programmers implement common traits with minimal boilerplate, typically generating an impl based on the annotated data type. This can be used with Eq, Copy, Debug, and many other traits. These implementations of derive are built in to the compiler.

It would be useful for library authors to provide their own, custom derive implementations. This was previously facilitated by the custom_derive feature, however, that is unstable and the implementation is hacky. We now offer a new solution based on procedural macros (often called 'macros 1.1', RFC, tracking issue) which we hope will be on a fast path to stabilisation.

The macros 1.1 solution offers the core token-based framework for declaring and using procedural macros (including a new crate type), but only a bare-bones set of features. In particular, even the access to tokens is limited: the only stable API is one providing conversion to and from strings. Keeping the API surface small allows us to make a minimal commitment as we continue iterating on the design. Modularization and hygiene are not covered, nevertheless, we believe that this API surface is sufficient for custom derive (as evidenced by the fact that Serde was easily ported over).

To write a macros 1.1 custom derive, you need only a function which takes and returns a proc_macro::TokenStream, you then annotate this function with an attribute containing the name of the derive. E.g., #[proc_macro_derive(Foo)] will enable #[derive(Foo)]. To convert between TokenStreams and strings, you use the to_string and parse functions.

There is a new kind of crate (alongside dylib, rlib, etc.) - a proc-macro crate. All macros 1.1 implementations must be in such a crate.

To use, you import the crate in the usual way using extern crate, and annotate that statement with #[macro_use]. You can then use the derive name in derive attributes.

Example

(These examples will need a pretty recent nightly compiler).

Macro crate (b.rs):

#![feature(proc_macro, proc_macro_lib)]
#![crate_type = "proc-macro"]

extern crate proc_macro;

use proc_macro::TokenStream;

#[proc_macro_derive(B)]
pub fn derive(input: TokenStream) -> TokenStream {
    let input = input.to_string();
    format!("{}\n impl B for A {{ fn b(&self) {{}} }}", input).parse().unwrap()
}

Client crate (client.rs):

#![feature(proc_macro)]

#[macro_use]
extern crate b;

trait B {
    fn b(&self);
}

#[derive(B)]
struct A;

fn main() {
    let a = A;
    a.b();
}

To build:

rustc b.rs && rustc client.rs -L .

When building with Cargo, the macro crate must include proc-macro = true in its Cargo.toml.

Note that token-based procedural macros are a lower-level feature than the old syntax extensions. The expectation is that authors will not manipulate the tokens directly (as we do in the examples, to keep things short), but use third-party libraries such as Syn or Aster. It is early days for library support as well as language support, so there might be some wrinkles to iron out.

To see more complete examples, check out derive(new) or serde-derive.

Stability

As mentioned above, we intend for macros 1.1 custom derive to become stable as quickly as possible. We have just entered FCP on the tracking issue, so this feature could be in the stable compiler in as little as 12 weeks. Of course we want to make sure we get enough experience of the feature in libraries, and to fix some bugs and rough edges, before stabilisation. You can track progress in the tracking issue. The old custom derive feature is in FCP for deprecation and will be removed in the near-ish future.

Token-based syntax extensions

If you are already a procedural macro author using the syntax extension mechanism, you might be interested to try out token-based syntax extensions. These are new-style procedural macros with a tokens -> tokens signature, but which use the existing syntax extension infrastructure for declaring and using the macro. This will allow you to experiment with implementing procedural macros without changing the way your macros are used. It is very early days for this kind of macro (the RFC hasn't even been accepted yet) and there will be a lot of evolution from the current feature to the final one. Experimenting now will give you a chance to get a taste for the changes and to influence the long-term design.

To write such a macro, you must use crates which are part of the compiler and thus will always be unstable, eventually you won't have to do this and we'll be on the path to stabilisation.

Procedural macros are functions and return a TokenStream just like macros 1.1 custom derive (note that it's actually a different TokenStream implementation, but that will change). Function-like macros have a single TokenStream as input and attribute-like macros take two (one for the annotated item and one for the arguments to the macro). Macro functions must be registered with a plugin_registrar.

To use a macro, you use #![plugin(foo)] to import a macro crate called foo. You can then use the macros using #[bar] or bar!(...) syntax.

Example

Macro crate (foo.rs):

#![feature(plugin, plugin_registrar, rustc_private)]
#![crate_type = "dylib"]

extern crate proc_macro_plugin;
extern crate rustc_plugin;
extern crate syntax;

use proc_macro_plugin::prelude::*;
use syntax::ext::proc_macro_shim::prelude::*;

use rustc_plugin::Registry;
use syntax::ext::base::SyntaxExtension;


#[plugin_registrar]
pub fn plugin_registrar(reg: &mut Registry) {
    reg.register_syntax_extension(token::intern("foo"),
                                  SyntaxExtension::AttrProcMacro(Box::new(foo_impl)));
    reg.register_syntax_extension(token::intern("bar"),
                                  SyntaxExtension::ProcMacro(Box::new(bar)));
}

fn foo_impl(_attr: TokenStream, item: TokenStream) -> TokenStream {
    let _source = item.to_string();
    lex("fn f() { println!(\"Good bye!\"); }")
}

fn bar(_args: TokenStream) -> TokenStream {
    lex("println!(\"Hello!\");")
}

Client crate (client.rs):

#![feature(plugin, custom_attribute)]

#![plugin(foo)]

#[foo]
fn f() {
    println!("Hello world!");
}

fn main() {
    f();

    bar!();
}

To build:

rustc foo.rs && rustc client.rs -L .

Stability

There is a lot of work still to do, stabilisation is going to be a long haul. Declaring and importing macros should end up very similar to custom derive with macros 1.1 - no plugin registrar. We expect to support full modularisation too. We need to provide, and then iterate on, the library functionality that is available to macro authors from the compiler. We need to implement a comprehensive hygiene scheme. We then need to gain experience and confidence with the system, and probably write some more RFCs.

However! The basic concept of tokens -> tokens macros will remain. So even though the infrastructure for building and declaring macros will change, the macro definitions themselves should be relatively future proof. Mostly, macros will just get easier to write (so less reliance on external libraries, or those libraries can get more efficient) and potentially more powerful.

We intend to deprecate and remove the MultiModifier and MultiDecorator forms of syntax extension. It is likely there will be a long-ish deprecation period to give macro authors opportunity to move to the new system.

Declarative macros

This post has been focused on procedural macros, but we also have plans for declarative macros. However, since these are stable and mostly work, these plans are lower priority and longer-term. The current idea is that there will be new kind of declarative macro (possibly declared using macro! rather than macro_rules!); macro_rules macros will continue working with no breaking changes. The new declarative macros will be different, but we hope to keep them mostly backwards compatible with existing macros. Expect improvements to naming and modularisation, hygiene, and declaration syntax.

Hat-tips

Thanks to Alex Crichton for driving, designing, and implementing (which, in his usual fashion, was done with eye-watering speed) the macros 1.1 system; Jeffrey Seyfried for making some huge improvements to the compiler and macro system to facilitate the new macro designs; Cameron Swords for implementing a bunch of the TokenStream and procedural macros work; Erick Tryzelaar, David Tolnay, and Sean Griffin for updating Serde and Diesel to use custom derive, and providing valuable feedback on the designs; and to everyone who has contributed feedback and experience as the designs have progressed.

Comments

This post was also posted on users.r-l.o, if you want to comment or discuss, please do so there.