19 January 2016 / Mozilla

Libmacro

As I outlined in an earlier post, libmacro is a new crate designed to be used by procedural macro authors. It provides the basic API for procedural macros to interact with the compiler. I expect higher level functionality to be provided by library crates. In this post I'll go into a bit more detail about the API I think should be exposed here.

This is a lot of stuff. I've probably missed something. If you use syntax extensions today and do something with libsyntax that would not be possible with libmacro, please let me know!

I previously introduced MacroContext as one of the gateways to libmacro. All procedural macros will have access to a &mut MacroContext.

Tokens

I described the tokens module in the last post, I won't repeat that here.

There are a few more things I thought of. I mentioned a TokenStream which is a sequence of tokens. We should also have TokenSlice which is a borrowed slice of tokens (the slice to TokenStream's Vec). These should implement the standard methods for sequences, in particular they support iteration, so can be maped, etc.

In the earlier blog post, I talked about a token kind called Delimited which contains a delimited sequence of tokens. I would like to rename that to Sequence and add a None variant to the Delimiter enum. The None option is so that we can have blocks of tokens without using delimiters. It will be used for noting unsafety and other properties of tokens. Furthermore, it is useful for macro expansion (replacing the interpolated AST tokens currently present). Although None blocks do not affect scoping, they do affect precedence and parsing.

We should provide API for creating tokens. By default these have no hygiene information and come with a span which has no place in the source code, but shows the source of the token to be the procedural macro itself (see below for how this interacts with expansion of the current macro). I expect a make_ function for each kind of token. We should also have API for creating macros in a given scope (which do the same thing but with provided hygiene information). This could be considered an over-rich API, since the hygiene information could be set after construction. However, since hygiene is fiddly and annoying to get right, we should make it as easy as possible to work with.

There should also be a function for creating a token which is just a fresh name. This is useful for creating new identifiers. Although this can be done by interning a string and then creating a token around it, it is used frequently enough to deserve a helper function.

Emitting errors and warnings

Procedural macros should report errors, warnings, etc. via the MacroContext. They should avoid panicking as much as possible since this will crash the compiler (once catch_panic is available, we should use it to catch such panics and exit gracefully, however, they will certainly still meaning aborting compilation).

Libmacro will 're-export' DiagnosticBuilder from syntax::errors. I don't actually expect this to be a literal re-export. We will use libmacro's version of Span, for example.

impl MacroContext {
    pub fn struct_error(&self, &str) -> DiagnosticBuilder;
    pub fn error(&self, Option<Span>, &str);
}

pub mod errors {
    pub struct DiagnosticBuilder { ... }
    impl DiagnosticBuilder { ... }
    pub enum ErrorLevel { ... }
}

There should be a macro try_emit!, which reduces a Result<T, ErrStruct> to a T or calls emit() and then calls unreachable!() (if the error is not fatal, then it should be upgraded to a fatal error).

Tokenising and quasi-quoting

The simplest function here is tokenize which takes a string (&str) and returns a Result<TokenStream, ErrStruct>. The string is treated like source text. The success option is the tokenised version of the string. I expect this function must take a MacroContext argument.

We will offer a quasi-quoting macro. This will return a TokenStream (in contrast to today's quasi-quoting which returns AST nodes), to be precise a Result<TokenStream, ErrStruct>. The string which is quoted may include metavariables ($x), and these are filled in with variables from the environment. The type of the variables should be either a TokenStream, a TokenTree, or a Result<TokenStream, ErrStruct> (in this last case, if the variable is an error, then it is just returned by the macro). For example,

fn foo(cx: &mut MacroContext, tokens: TokenStream) -> TokenStream {
    quote!(cx, fn foo() { $tokens }).unwrap()
}

The quote! macro can also handle multiple tokens when the variable corresponding with the metavariable has type [TokenStream] (or is dereferencable to it). In this case, the same syntax as used in macros-by-example can be used. For example, if x: Vec<TokenStream> then quote!(cx, ($x),*) will produce a TokenStream of a comma-separated list of tokens from the elements of x.

Since the tokenize function is a degenerate case of quasi-quoting, an alternative would be to always use quote! and remove tokenize. I believe there is utility in the simple function, and it must be used internally in any case.

These functions and macros should create tokens with spans and hygiene information set as described above for making new tokens. We might also offer versions which takes a scope and uses that as the context for tokenising.

Parsing helper functions

There are some common patterns for tokens to follow in macros. In particular those used as arguments for attribute-like macros. We will offer some functions which attempt to parse tokens into these patterns. I expect there will be more of these in time; to start with:

pub mod parsing {
    // Expects `(foo = "bar"),*`
    pub fn parse_keyed_values(&TokenSlice, &mut MacroContext) -> Result<Vec<(InternedString, String)>, ErrStruct>;
    // Expects `"bar"`
    pub fn parse_string(&TokenSlice, &mut MacroContext) -> Result<String, ErrStruct>;
}

To be honest, given the token design in the last post, I think parse_string is unnecessary, but I wanted to give more than one example of this kind of function. If parse_keyed_values is the only one we end up with, then that is fine.

Pattern matching

The goal with the pattern matching API is to allow procedural macros to operate on tokens in the same way as macros-by-example. The pattern language is thus the same as that for macros-by-example.

There is a single macro, which I propose calling matches. Its first argument is the name of a MacroContext. Its second argument is the input, which must be a TokenSlice (or dereferencable to one). The third argument is a pattern definition. The macro produces a Result<T, ErrStruct> where T is the type produced by the pattern arms. If the pattern has multiple arms, then each arm must have the same type. An error is produced if none of the arms in the pattern are matched.

The pattern language follows the language for defining macros-by-example (but is slightly stricter). There are two forms, a single pattern form and a multiple pattern form. If the first character is a { then the pattern is treated as a multiple pattern form, if it starts with ( then as a single pattern form, otherwise an error (causes a panic with a Bug error, as opposed to returning an Err).

The single pattern form is (pattern) => { code }. The multiple pattern form is {(pattern) => { code } (pattern) => { code } ... (pattern) => { code }}. code is any old Rust code which is executed when the corresponding pattern is matched. The pattern follows from macros-by-example - it is a series of characters treated as literals, meta-variables indicated with $, and the syntax for matching multiple variables. Any meta-variables are available as variables in the righthand side, e.g., $x becomes available as x. These variables have type TokenStream if they appear singly or Vec<TokenStream> if they appear multiply (or Vec<Vec<TokenStream>> and so forth).

Examples:

matches!(cx, input, (foo($x:expr) bar) => {quote(cx, foo_bar($x).unwrap()}).unwrap()

matches!(cx, input, {
    () => {
        cx.err("No input?");
    }
    (foo($($x:ident),+ bar) => {
        println!("found {} idents", x.len());
        quote!(($x);*).unwrap()
    }
}
})

Note that since we match AST items here, our backwards compatibility story is a bit complicated (though hopefully not much more so than with current macros).

Hygiene

The intention of the design is that the actual hygiene algorithm applied is irrelevant. Procedural macros should be able to use the same API if the hygiene algorithm changes (of course the result of applying the API might change). To this end, all hygiene objects are opaque and cannot be directly manipulated by macros.

I propose one module (hygiene) and two types: Context and Scope.

A Context is attached to each token and contains all hygiene information about that token. If two tokens have the same Context, then they may be compared syntactically. The reverse is not true - two tokens can have different Contexts and still be equal. Contexts can only be created by applying the hygiene algorithm and cannot be manipulated, only moved and stored.

MacroContext has a method fresh_hygiene_context for creating a new, fresh Context (i.e., a Context not shared with any other tokens).

MacroContext has a method expansion_hygiene_context for getting the Context where the macro is defined. This is equivalent to .expansion_scope().direct_context(), but might be more efficient (and I expect it to be used a lot).

A Scope provides information about a position within an AST at a certain point during macro expansion. For example,

fn foo() {
    a
    {
        b
        c
    }
}

a and b will have different Scopes. b and c will have the same Scopes, even if b was written in this position and c is due to macro expansion. However, a Scope may contain more information than just the syntactic scopes, for example, it may contain information about pending scopes yet to be applied by the hygiene algorithm (i.e., information about let expressions which are in scope).

Note that a Scope means a scope in the macro hygiene sense, not the commonly used sense of a scope declared with {}. In particular, each let statement starts a new scope and the items and statements in a function body are in different scopes.

The functions lookup_item_scope and lookup_statement_scope take a MacroContext and a path, represented as a TokenSlice, and return the Scope which that item defines or an error if the path does not refer to an item, or the item does not define a scope of the right kind.

The function lookup_scope_for is similar, but returns the Scope in which an item is declared.

MacroContext has a method expansion_scope for getting the scope in which the current macro is being expanded.

Scope has a method direct_context which returns a Context for items declared directly (c.f., via macro expansion) in that Scope.

Scope has a method nested which creates a fresh Scope nested within the receiver scope.

Scope has a static method empty for creating an empty scope, that is one with no scope information at all (note that this is different from a top-level scope).

I expect the exact API around Scopes and Contexts will need some work. Scope seems halfway between an intuitive, algorithm-neutral abstraction, and the scopes from the sets of scopes hygiene algorithm. I would prefer a Scope should be more abstract, on the other hand, macro authors may want fine-grained control over hygiene application.

Manipulating hygiene information on tokens,

pub mod hygiene {
   pub fn add(cx: &mut MacroContext, t: &Token, scope: &Scope) -> Token;
   // Maybe unnecessary if we have direct access to Tokens.
   pub fn set(t: &Token, cx: &Context) -> Token;
   // Maybe unnecessary - can use set with cx.expansion_hygiene_context().
   // Also, bad name.
   pub fn current(cx: &MacroContext, t: &Token) -> Token;
}

add adds scope to any context already on t (Context should have a similar method). Note that the implementation is a bit complex - the nature of the Scope might mean we replace the old context completely, or add to it.

Applying hygiene when expanding the current macro

By default, the current macro will be expanded in the standard way, having hygiene applied as expected. Mechanically, hygiene information is added to tokens when the macro is expanded. Assuming the sets of scopes algorithm, scopes (for example, for the macro's definition, and for the introduction) are added to any scopes already present on the token. A token with no hygiene information will thus behave like a token in a macro-by-example macro. Hygiene due to nested scopes created by the macro do not need to be taken into account by the macro author, this is handled at expansion time.

Procedural macro authors may want to customise hygiene application (it is common in Racket), for example, to introduce items that can be referred to by code in the call-site scope.

We must provide an option to expand the current macro without applying hygiene; the macro author must then handle hygiene. For this to work, the macro must be able to access information about the scope in which it is applied (see MacroContext::expansion_scope, above) and to supply a Scope indicating scopes that should be added to tokens following the macro expansion.

pub mod hygiene {
    pub enum ExpansionMode {
        Automatic,
        Manual(Scope),
    }
}

impl MacroContext {
    pub fn set_hygienic_expansion(hygiene::ExpansionMode);
}

We may wish to offer other modes for expansion which allow for tweaking hygiene application without requiring full manual application. One possible mode is where the author provides a Scope for the macro definition (rather than using the scope where the macro is actually defined), but hygiene is otherwise applied automatically. We might wish to give the author the option of applying scopes due to the macro definition, but not the introduction scopes.

On a related note, might we want to affect how spans are applied when the current macro is expanded? I can't think of a use case right now, but it seems like something that might be wanted.

Blocks of tokens (that is a Sequence token) may be marked (not sure how, exactly, perhaps using a distinguished context) such that it is expanded without any hygiene being applied or spans changed. There should be a function for creating such a Sequence from a TokenSlice in the tokens module. The primary motivation for this is to handle the tokens representing the body on which an annotation-like macro is present. For a 'decorator' macro, these tokens will be untouched (passed through by the macro), and since they are not touched by the macro, they should appear untouched by it (in terms of hygiene and spans).

Applying macros

We provide functionality to expand a provided macro or to lookup and expand a macro.

pub mod apply {
    pub fn expand_macro(cx: &mut MacroContext,
                        expansion_scope: Scope,
                        macro: &TokenSlice,
                        macro_scope: Scope,
                        input: &TokenSlice)
                        -> Result<(TokenStream, Scope), ErrStruct>;
    pub fn lookup_and_expand_macro(cx: &mut MacroContext,
                                   expansion_scope: Scope,
                                   macro: &TokenSlice,
                                   input: &TokenSlice)
                                   -> Result<(TokenStream, Scope), ErrStruct>;
}

These functions apply macro hygiene in the usual way, with expansion_scope dictating the scope into which the macro is expanded. Other spans and hygiene information is taken from the tokens. expand_macro takes pending scopes from macro_scope, lookup_and_expand_macro uses the proper pending scopes. In order to apply the hygiene algorithm, the result of the macro must be parsable. The returned scope will contain pending scopes that can be applied by the macro to subsequent tokens.

We could provide versions that don't take an expansion_scope and use cx.expansion_scope(). Probably unnecessary.

pub mod apply {
    pub fn expand_macro_unhygienic(cx: &mut MacroContext,
                                   macro: &TokenSlice,
                                   input: &TokenSlice)
                                   -> Result<TokenStream, ErrStruct>;
    pub fn lookup_and_expand_macro_unhygienic(cx: &mut MacroContext,
                                              macro: &TokenSlice,
                                              input: &TokenSlice)
                                              -> Result<TokenStream, ErrStruct>;
}

The _unhygienic variants expand a macro as in the first functions, but do not apply the hygiene algorithm or change any hygiene information. Any hygiene information on tokens is preserved. I'm not sure if _unhygienic are the right names - using these is not necessarily unhygienic, just that we are automatically applying the hygiene algorithm.

Note that all these functions are doing an eager expansion of macros, or in Scheme terms they are local-expand functions.

Looking up items

The function lookup_item takes a MacroContext and a path represented as a TokenSlice and returns a TokenStream for the item referred to by the path, or an error if name resolution failed. I'm not sure where this function should live.

Interned strings

pub mod strings {
    pub struct InternedString;

    impl InternedString {
        pub fn get(&self) -> String;
    }

    pub fn intern(cx: &mut MacroContext, s: &str) -> Result<InternedString, ErrStruct>;
    pub fn find(cx: &mut MacroContext, s: &str) -> Result<InternedString, ErrStruct>;
    pub fn find_or_intern(cx: &mut MacroContext, s: &str) -> Result<InternedString, ErrStruct>;
}

intern interns a string and returns a fresh InternedString. find tries to find an existing InternedString.

Spans

A span gives information about where in the source code a token is defined. It also gives information about where the token came from (how it was generated, if it was generated code).

There should be a spans module in libmacro, which will include a Span type which can be easily inter-converted with the Span defined in libsyntax. Libsyntax spans currently include information about stability, this will not be present in libmacro spans.

If the programmer does nothing special with spans, then they will be 'correct' by default. There are two important cases: tokens passed to the macro and tokens made fresh by the macro. The former will have the source span indicating where they were written and will include their history. The latter will have no source span and indicate they were created by the current macro. All tokens will have the history relating to expansion of the current macro added when the macro is expanded. At macro expansion, tokens with no source span will be given the macro use-site as their source.

Spans can be freely copied between tokens.

It will probably useful to make it easy to manipulate spans. For example, rather than point at the macro's defining function, point at a helper function where the token is made. Or to set the origin to the current macro when the token was produced by another which should an implementation detail. I'm not sure what such an interface should look like (and is probably not necessary in an initial library).

Feature gates

pub mod features {
    pub enum FeatureStatus {
        // The feature gate is allowed.
        Allowed,
        // The feature gate has not been enabled.
        Disallowed,
        // Use of the feature is forbidden by the compiler.
        Forbidden,
    }

    pub fn query_feature(cx: &MacroContext, feature: Token) -> Result<FeatureStatus, ErrStruct>;
    pub fn query_feature_by_str(cx: &MacroContext, feature: &str) -> Result<FeatureStatus, ErrStruct>;
    pub fn query_feature_unused(cx: &MacroContext, feature: Token) -> Result<FeatureStatus, ErrStruct>;
    pub fn query_feature_by_str_unused(cx: &MacroContext, feature: &str) -> Result<FeatureStatus, ErrStruct>;

    pub fn used_feature_gate(cx: &MacroContext, feature: Token) -> Result<(), ErrStruct>;
    pub fn used_feature_by_str(cx: &MacroContext, feature: &str) -> Result<(), ErrStruct>;

    pub fn allow_feature_gate(cx: &MacroContext, feature: Token) -> Result<(), ErrStruct>;
    pub fn allow_feature_by_str(cx: &MacroContext, feature: &str) -> Result<(), ErrStruct>;
    pub fn disallow_feature_gate(cx: &MacroContext, feature: Token) -> Result<(), ErrStruct>;
    pub fn disallow_feature_by_str(cx: &MacroContext, feature: &str) -> Result<(), ErrStruct>;
}

The query_* functions query if a feature gate has been set. They return an error if the feature gate does not exist. The _unused variants do not mark the feature gate as used. The used_ functions mark a feature gate as used, or return an error if it does not exist.

The allow_ and disallow_ functions set a feature gate as allowed or disallowed for the current crate. These functions will only affect feature gates which take affect after parsing and expansion are complete. They do not affect feature gates which are checked during parsing or expansion.

Question: do we need the used_ functions? Could just call query_ and ignore the result.

Attributes

We need some mechanism for setting attributes as used. I don't actually know how the unused attribute checking in the compiler works, so I can't spec this area. But, I expect MacroContext to make available some interface for reading attributes on a macro use and marking them as used.