Macro plans, overview
In this post I want to give a bit of an overview of the changes I'm planning to propose for the macro system. I haven't worked out some of the details yet, so this could change a lot.
To summarise, the broad thrusts of the redesign are:
- changes to the way procedural macros work and the parts of the compiler they have access to,
- change the hygiene algorithm, and what hygiene is applied to,
- address modularisation issues,
- syntactic changes.
I'll summarise each here, but there will probably be a blog post about each before a proper RFC. At the end of this blog post I'll talk about backwards compatibility.
I'd also like to support macro and ident inter-operation better, as described here.
Procedural macros
Mechanism
I intend to tweak the system of traits and enums, etc. to make procedural macros easier to use. My intention is that there should be a small number of function signatures that can be implemented (not just one unfortunately, because I believe function-like vs attribute-like macros will take different arguments, furthermore I think we need versions for hygienic expansion and expansion with DIY-hygiene, and the latter case must be supplied with some hygiene information in order for the function to do it's own hygiene. I'm not certain that is the right approach though). Although this is not as Rust-y as using traits, I believe the simplicity benefits outweigh the loss in elegance.
All macros will take a set of tokens in and generate a set of tokens out. The token trees should be a simplified version of the compiler's internal token trees to allow procedural macros more flexibility (and forwards compatibility). For attribute-like macros, the code that they annotate must still parse (necessary due to internal attributes, unfortunately), but will be supplied as tokens to the macro itself.
I intend that libsyntax will remain unstable and (stable) procedural macros will not have direct access to it or any other compiler internals. We will create a new crate, libmacro (or something) which will re-export token trees from libsyntax and provide a whole bunch of functionality specifically for procedural macros. This library will take the usual path to stabilisation.
Macros will be able to parse tokens and expand macros in various ways. The output will be some kind of AST. However, after manipulating the AST, it is converted back into tokens to be passed back to the macro expander. Note that this requires us storing hygiene and span information directly in the tokens, not the AST.
I'm not sure exactly what the AST we provide should look like, nor the bounds on what should be in libmacro vs what can be supplied by outside libraries. I would like to start by providing no AST at all and see what the eco-system comes up with.
It is worth thinking about the stability implications of this proposal. At some point in the future, the procedural macro mechanism and libmacro will be stable. So, a crate using stable Rust can use a crate which provides a procedural macro. At some point later we evolve the language in a non-breaking way which changes the AST (internal to libsyntax). We must ensure that this does not change the structure of the token trees we give to macros. I believe that should not be a problem for a simple enough token tree. However, the procedural macro might expect those tokens to parse in a certain way, which they no longer do causing the procedural macro to fail and thus compilation to fail. Thus, the stability guarantees we provide users can be subverted by procedural macros. However, I don't think this is possible to prevent. In the most pathological case, the macro could check if the current date is later than a given one and in that case panic. So, we are basically passing the buck about backwards compatibility with the language to the procedural macro authors and the libraries they use. There is an obvious hazard here if a macro is widely used and badly written. I'm not sure if this can be addressed, other than making sure that libraries exist which make compatibility easy.
Libraries
I hope that the situation for macro authors will be similar to that for other authors: we provide a small but essential standard library (libmacro) and more functionality is provided by the ecosystem via crates.io.
The functionality I expect to see in libmacro should be focused on interaction with the rest of the parser and macro expander, including macro hygiene. I expect it to include:
- interning a string and creating an ident token from a string
- creating and manipulating tokens
- expanding macros (macro_rules and procedural), possibly in different ways
- manipulating the hygiene of tokens
- manipulating expansion traces for spans
- name resolution of module and macro names - note that I expect these to return token trees, which gives a macro access to the whole program, I'm not sure this is a good idea since it breaks locality for macros
- check and set feature gates
- mark attributes and imports as used
The most important external libraries I would like to see would be to provide an AST-like abstraction, parsing, and tools for building and manipulating AST. These already exist (syntex, ASTer), so I am confident we can have good solutions in this space, working towards crates which are provided on crates.io, but are officially blessed (similar to the goals of other libraries).
I would very much like to see quasi-quoting and pattern matching in blessed libraries. These are important tools, the former currently provided by libsyntax. I don't see any reason these must be provided by libmacro, and since quasi-quoting produces AST, they probably can't be (since they would be associated with a particular AST implementation). However, I would like to spend some time improving the current quasi-quoting system, in particular to make it work better with hygiene and expansion traces.
Alternatively, libmacro could provide quasi-quoting which produces token trees, and there is then a second step to produce AST. Since hygiene info will operate at the tokens level, this might be possible.
Pattern matching on tokens should provide functionality similar to that provided by macro_rules!
, making writing procedural macros much easier. I'm convinced we need something here, but not sure of the design.
Naming and registration
See section on modularisation below, the same things apply to procedural macros as to macro_rules macros.
A macro called baz
declared in a module bar
inside a crate foo
could be called using ::foo::bar::baz!(...)
or imported using use foo::bar::baz!;
and used as baz!(...)
. Other than a feature flag until procedural macros are stabilised, users of macros need no other annotations. When looking at an extern crate foo
statement, the compiler will work out whether we are importing macros.
I expect that functions expected to work as procedural macros would be marked with an attribute (#[macro]
or some such). We would also have #[cfg(macro)]
for helper functions, etc. Initially, I expect a whole crate must be #[cfg(macro)]
, but eventually I would like to allow mixing in a crate (just as we allow macro_rules
macros in the same crate as normal code).
There would be no need to register macros with the plugin registry.
A vaguely related issue is whether interaction between the macros and the compiler should be via normal function calls (to libmacro) or via IPC. The latter would allow produral macros to be used without dynamic linking and thus permit a statically linked compiler.
Hygiene
I plan to change the hygiene algorithm we use from mtwt to sets of scopes. This allows us to use hygiene information in name resolution, thus alleviating the 'absolute path' problem in macros. We can also use this information to support hygienic checking of privacy. I'll explain the algorithm and how it will apply to Rust in another blog post. I think this algorithm will be easier for procedural macro authors to work with too.
Orthogonally, I want to make all identifiers hygienic, not just variables and labels. I would also like to support hygienic unsafety. I believe both these things are more implementation than design issues.
Modularisation
The goal here is to treat macros the same way as other items, naming via paths and allowing imports. This includes naming of attributes, which will allow paths for naming (e.g., #[foo::bar::baz]
). Ordering of macros should also not be important. The mechanism to support this is moving parts of name resolution and privacy checking to macro expansion time. The details of this (and the interaction with sets of scopes hygiene, which essentially gives a new mechanism for name resolution) are involved.
Syntax
These things are nice to have, rather than core parts of the plan. New syntax for procedural macros is covered above.
I would like to fix the expansion issues with arguments and nested macros, see blog post.
I propose that new macros should use macro!
rather than macro_rules!
.
I would like a syntactic form for macro_rules macros which only matches a single pattern and is more lightweight than the current syntax. The current syntax would still be used where there are multiple patterns. Something like,
macro! foo(...) => {
...
}
Perhaps we drop the =>
too.
We need to allow privacy annotations for macros, not sure the best way to do this: pub macro! foo { ... }
or macro! pub foo { ... }
or something else.
Backwards compatability
Procedural macros are currently unstable, there will be a lot of breaking changes, but the reward is a path to stability.
macro_rules!
is a stable part of the language. It will not break (modulo usual caveat about bug fixes). The plan is to introduce a whole new macro system around macro!
, if you have macros currently called macro!
, I guess we break them (we will run a warning cycle for this and try and help anyone who is affected). We will deprecate macro_rules!
once macro!
is stable. We will track usage with the intention of removing macro_rules
at 2.0 or 3.0 or whatever. All macros in the standard libraries will be converted to using macro!
, this will be a breaking change, we will mitigate by continuing to support the old but deprecated versions of the macros. Hopefully, modularisation will support this (needs more thought to be sure). The only change for users of macros will be how the macro is named, not how it is used (modulo new applications of hygiene).
Most existing macro_rules!
macros should be valid macro!
macros. The only difference will be using macro!
instead of macro_rules!
and the new scoping/naming rules may lead to name clashes that didn't exist before (note this is not in itself a breaking change, it is a side effect of using the new system). Macros converted in this way should only break where they take advantage of holes in the current hygiene system. I hope that this is a low enough bar that adoption of macro!
by macro_rules!
authors will be quick.
Hygiene
There are two backwards compatibility hazards with hygiene, both affect only macro_rules!
macros: we must emulate the mtwt algorithm with the sets of scopes algorithm, and we must ensure unhygienic name resolution for items which are currently not treated hygienically. In the second case, I think we can simulate unhygienic expansion for types etc, by using the set of scopes for the macro use-site, rather than the proper set. Since only local variables are currently treated hygienically, I believe this means the first case will Just Work. More details on this in a future blog post.