RFC 1559: attributes-with-literals

lang (syntax | attributes)

Feature Name: attributes_with_literals
Start Date: 2016-03-28
RFC PR: rust-lang/rfcs#1559
Rust Issue: rust-lang/rust#34981

Summary

This RFC proposes accepting literals in attributes by defining the grammar of attributes as:

attr : '#' '!'? '[' meta_item ']' ;

meta_item : IDENT ( '=' LIT | '(' meta_item_inner? ')' )? ;

meta_item_inner : (meta_item | LIT) (',' meta_item_inner)? ;

Note that LIT is a valid Rust literal and IDENT is a valid Rust identifier. The following attributes, among others, would be accepted by this grammar:

#[attr]
#[attr(true)]
#[attr(ident)]
#[attr(ident, 100, true, "true", ident = 100, ident = "hello", ident(100))]
#[attr(100)]
#[attr(enabled = true)]
#[enabled(true)]
#[attr("hello")]
#[repr(C, align = 4)]
#[repr(C, align(4))]

Motivation

At present, literals are only accepted as the value of a key-value pair in attributes. What's more, only string literals are accepted. This means that literals can only appear in forms of #[attr(name = "value")] or #[attr = "value"].

This forces non-string literal values to be awkwardly stringified. For example, while it is clear that something like alignment should be an integer value, the following are disallowed: #[align(4)], #[align = 4]. Instead, we must use something akin to #[align = "4"]. Even #[align("4")] and #[name("name")] are disallowed, forcing key-value pairs or identifiers to be used instead: #[align(size = "4")] or #[name(name)].

In short, the current design forces users to use values of a single type, and thus occasionally the wrong type, in attributes.

Cleaner Attributes

Implementation of this RFC can clean up the following attributes in the standard library:

#![recursion_limit = "64"] => #![recursion_limit = 64] or #![recursion_limit(64)]
#[cfg(all(unix, target_pointer_width = "32"))] => #[cfg(all(unix, target_pointer_width = 32))]

If align were to be added as an attribute, the following are now valid options for its syntax:

#[repr(align(4))]
#[repr(align = 4)]
#[align = 4]
#[align(4)]

Syntax Extensions

As syntax extensions mature and become more widely used, being able to use literals in a variety of positions becomes more important.

Detailed design

To clarify, literals are:

Strings: "foo", r##"foo"##
Byte Strings: b"foo"
Byte Characters: b'f'
Characters: 'a'
Integers: 1, 1{i,u}{8,16,32,64,size}
Floats: 1.0, 1.0f{32,64}
Booleans: true, false

They are defined in the manual and by implementation in the AST.

Implementation of this RFC requires the following changes:

The MetaItemKind structure would need to allow literals as top-level entities:

pub enum MetaItemKind {
    Word(InternedString),
    List(InternedString, Vec<P<MetaItem>>),
    NameValue(InternedString, Lit),
    Literal(Lit),
}

libsyntax (libsyntax/parse/attr.rs) would need to be modified to allow literals as values in k/v pairs and as top-level entities of a list.
Crate metadata encoding/decoding would need to encode and decode literals in attributes.

Drawbacks

This RFC requires a change to the AST and is likely to break syntax extensions using attributes in the wild.

Alternatives

Token trees

An alternative is to allow any tokens inside of an attribute. That is, the grammar could be:

attr : '#' '!'? '[' TOKEN+ ']' ;

where TOKEN is any valid Rust token. The drawback to this approach is that attributes lose any sense of structure. This results in more difficult and verbose attribute parsing, although this could be ameliorated through libraries. Further, this would require almost all of the existing attribute parsing code to change.

The advantage, of course, is that it allows any syntax and is rather future proof. It is also more inline with macro!s.

Allow only unsuffixed literals

This RFC proposes allowing any valid Rust literals in attributes. Instead, the use of literals could be restricted to only those that are unsuffixed. That is, only the following literals could be allowed:

Strings: "foo"
Characters: 'a'
Integers: 1
Floats: 1.0
Booleans: true, false

This cleans up the appearance of attributes will still increasing flexibility.

Allow literals only as values in k/v pairs

Instead of allowing literals in top-level positions, i.e. #[attr(4)], only allow them as values in key value pairs: #[attr = 4] or #[attr(ident = 4)]. This has the nice advantage that it was the initial idea for attributes, and so the AST types already reflect this. As such, no changes would have to be made to existing code. The drawback, of course, is the lack of flexibility. #[repr(C, align(4))] would no longer be valid.

Do nothing

Of course, the current design could be kept. Although it seems that the initial intention was for a form of literals to be allowed. Unfortunately, this idea was scrapped due to release pressure and never revisited. Even the reference alludes to allowing all literals as values in k/v pairs.

Unresolved questions

None that I can think of.