16 December 2015 / Mozilla

Procedural macros, framework

In this blog post I'll lay out what I think procedural macros should look like. I've discussed the syntax elsewhere, and I'll discuss the API we make available to procedural macros in a later blogpost. I've previously outlined the whole scope of changes to the macro system, and this repeats (or contradicts) some of that, with more detail.

Kinds of macros

There are two kinds of procedural macros - function-like macros and attribute macros. The former are functions marked with #[macro] the latter are marked with #[macro_attribute]. The former are used as foo!(tokens), the latter as #[foo] or #[foo(tokens)] attached to an AST node following the usual rules for Rust attributes. #![...] attributes are supported with the obvious semantics.

Function-like macros have the signature:

#[macro]
pub fn foo(TokenStream, &mut MacroContext) -> TokenStream

Attribute-like macros have the signature:

#[macro_attribute]
pub fn foo(Option<TokenStream>, TokenStream, &mut MacroContext) -> TokenStream

The first, optional stream of tokens is those in the attribute itself (tokens in #[foo(tokens)]). The second TokenStream is the tokens for the AST node the attribute is attached to. The returned TokenStream replaces that AST node, it can parse into zero or more AST nodes (i.e., we replace both the Modifier and Decorator syntax extensions).

We guarantee that the second TokenStream parses as some AST node, the first TokenStream may or may not parse.

The procedural macro should ensure that the returned TokenStream parses in the context of the macro call.

libmacro

libmacro is a new library added to the standard distribution. It is intended to be used primarily by procedural macros. Its contents will be on the same path to stability as other library crates (i.e., everything starts unstable, and we will stabilise items as their utility is proven). libsyntax will remain, but will be an implementation detail of the compiler - procedural macros should not use it and it will not be stabilised (i.e., stable macros must not use it).

The intention is that libmacro will provide a fairly low-level and bare-bones interface. We expect crates in the ecosystem to provide higher level libraries. In particular, libmacro will have no AST concept. It is expected that crates in the wider ecosystem will provide ASTs, and functionality for parsing tokens to AST and building ASTs.

Libmacro will contain token structures (which might be re-exported from libsyntax) and the MacroContext passed to macros. It will contain functionality for:

tokenising a string,
quasi-quoting (text and meta-variables to tokens),
pattern matching on tokens,
interning strings,
generating new identifiers with various hygiene options,
manipulating hygiene information on tokens,
applying macros (including name resolution),
manipulating spans (in particular expansion traces and making new spans) and getting location information from spans,
check state of feature gates and set feature gates to be used on code they generate,
mark attributes as used,
issuing error messages, etc.,
parsing tokens to key/value pairs as found in attributes.

Much of this functionality will be provided as methods on MacroContext.

I'll cover this API in more detail in a future post. Here I'll cover some aspects of tokens and MacroContext.

tokens

There is a lot of scope for designing an efficient and ergonomic token representation. Here is a starting point.

mod tokens {
    use {Span, HygieneObject, InternedString};

    pub struct TokenStream(Vec<TokenTree>);

    impl TokenStream {
        // Methods for adding and removing tokens, etc.
    }

    pub struct TokenTree {
        pub kind: TokenKind,
        pub span: Span,
        pub hygiene: HygieneObject,
    }

    pub enum TokenKind {
        Delimited(Delimiter, Vec<TokenTree>),
        
        // String includes the commenting tokens.
        Comment(String, CommentKind),
        String(String, StringKind),

        Dollar,
        Semicolon,
        Eof,

        Word(InternedString),
        Punctuation(char),
    }

    pub enum Delimiter {
        // { }
        Brace,
        // ( )
        Parenthesis,
        // [ ]
        Bracket,
    }

    pub enum CommentKind {
        Regular,
        InnerDoc,
        OuterDoc,
    }

    pub enum StringKind {
        Regular,
        Raw(usize),
        Byte,
        RawByte(usize),
    }
}

We could store HygieneInformation on TokenKind::Word, rather than on all tokens. We could also possibly store it for ranges of tokens, rather than individual tokens.

I'm not sure if we need to distinguish $ and ;, the former is used for metavariables in macros, the latter for delimiting items and so it is probably useful to distinguish them. It may be worth distinguishing ! and # since they are used in macro uses, though I can't think of an actual use-case.

It may be worth interning string literals. It may not be worth keeping the contents of comments, since this information can be found via the span (we currently do both these things).

I do not believe we need the interpolated non-terminals here.

We should also expose some helper functions. Note that while I expect we can offer stability guarantees (in time) for the data structures. These functions are only stable in their signatures, not the results. These take a TokenTree or &[TokenTree].

is_keyword
is_reserved_word
is_special_ident
is_operator
is_ident
is_path
metavariables, extracts meta-variables from a TokenStream e.g., foo($x:ident, $y:expr) would give [("x", 2, ident), ("y", 6, expr)] in some structure.

And possibly some convenience functions for building token trees.

MacroContext

The MacroContext fulfills several roles:

contains information about the contexts in which the macro is defined and is being expanded,
communicates information about how the macro's result should be used,
provides access to libmacro functionality which requires some kind of state.

MacroContext is probably a struct, but I expect most fields to be private. It could also be a trait.

Contextual information

Methods for accessing:

the span of the macro use (note that the TokenStream arguments also have their own spans),
the span of the macro definition,
the hygiene context for the use-site and definition site (note that these are opaque objects, again the supplied tokens will also have their own hygiene information),
any non-expanded attributes on the macro use,
the kind of AST node that macro expansion must produce,
the feature gates set where the macro is used,
if the macro use is inside a safe or unsafe context,
the delimiters used in a function-like macro.

Properties of returned tokens

set feature gates for expanded code,
control how hygiene is treated in expansion.

Other functionality

I'll cover much of this in the later blog post on libmacro. The most important functionality includes issuing error message, warnings, etc. This will include the ability to add notes and suggestions, and to give span information, based on the tokens supplied to the macro.

Staging

Initially, we will support both new procedural macros and old syntax extensions. Both will be unstable; old syntax extensions should cause a deprecation warning recommending new procedural macros. We will being stabilisation (after some time) by stabilising the attributes for procedural macros, then gradually stabilising parts of libmacro. Once enough of this is stable (and we have moved over the internal syntax extensions to the new scheme), we should remove support for old syntax extensions.

Alternatives

We currently support an IdentTT syntax extension which is a function-like macro with an identifier between the macro name and opening delimiter. I would like to cease support for it. However, it is potentially useful for emulating items (e.g., my_struct! foo { ... }), etc. Unfortunately, it is unsatisfactory for this since it can't handle modifiers (e.g., pub my_struct! foo ...), and some authors want to accept tokens other than an identifier after the macro use. I propose that we remove the facility for now. It could be added backwards-compatibly in the future by either adding a new attribute (#[macro_with_ident] or something) or by adding more info to the MacroContext.

MacroContext is a bit heavyweight, it might be better design to split into several smaller traits or structs. However, this would probably make things less ergonomic.