RFC 0218: empty-struct-with-braces

lang (data-types | syntax | patterns | expressions | product-types)

Start Date: (fill me in with today's date, 2014-08-28)
RFC PR: rust-lang/rfcs#218
Rust Issue: rust-lang/rust#218

Summary

When a struct type S has no fields (a so-called "empty struct"), allow it to be defined via either struct S; or struct S {}. When defined via struct S;, allow instances of it to be constructed and pattern-matched via either S or S {}. When defined via struct S {}, require instances to be constructed and pattern-matched solely via S {}.

Motivation

Today, when writing code, one must treat an empty struct as a special case, distinct from structs that include fields. That is, one must write code like this:

struct S2 { x1: int, x2: int }
struct S0; // kind of different from the above.

let s2 = S2 { x1: 1, x2: 2 };
let s0 = S0; // kind of different from the above.

match (s2, s0) {
    (S2 { x1: y1, x2: y2 },
     S0) // you can see my pattern here
     => { println!("Hello from S2({}, {}) and S0", y1, y2); }
}

While this yields code that is relatively free of extraneous curly-braces, this special case handling of empty structs presents problems for two cases of interest: automatic code generators (including, but not limited to, Rust macros) and conditionalized code (i.e. code with cfg attributes; see the CFG problem appendix). The heart of the code-generator argument is: Why force all to-be-written code-generators and macros with special-case handling of the empty struct case (in terms of whether or not to include the surrounding braces), especially since that special case is likely to be forgotten (yielding a latent bug in the code generator).

The special case handling of empty structs is also a problem for programmers who actively add and remove fields from structs during development; such changes cause a struct to switch from being empty and non-empty, and the associated revisions of changing removing and adding curly braces is aggravating (both in effort revising the code, and also in extra noise introduced into commit histories).

This RFC proposes an approach similar to the one we used circa February 2013, when both S0 and S0 { } were accepted syntaxes for an empty struct. The parsing ambiguity that motivated removing support for S0 { } is no longer present (see the Ancient History appendix). Supporting empty braces in the syntax for empty structs is easy to do in the language now.

Detailed design

There are two kinds of empty structs: Braced empty structs and flexible empty structs. Flexible empty structs are a slight generalization of the structs that we have today.

Flexible empty structs are defined via the syntax struct S; (as today).

Braced empty structs are defined via the syntax struct S { } ("new").

Both braced and flexible empty structs can be constructed via the expression syntax S { } ("new"). Flexible empty structs, as today, can also be constructed via the expression syntax S.

Both braced and flexible empty structs can be pattern-matched via the pattern syntax S { } ("new"). Flexible empty structs, as today, can also be pattern-matched via the pattern syntax S.

Braced empty struct definitions solely affect the type namespace, just like normal non-empty structs. Flexible empty structs affect both the type and value namespaces.

As a matter of style, using braceless syntax is preferred for constructing and pattern-matching flexible empty structs. For example, pretty-printer tools are encouraged to emit braceless forms if they know that the corresponding struct is a flexible empty struct. (Note that pretty printers that handle incomplete fragments may not have such information available.)

There is no ambiguity introduced by this change, because we have already introduced a restriction to the Rust grammar to force the use of parentheses to disambiguate struct literals in such contexts. (See Rust RFC 25).

The expectation is that when migrating code from a flexible empty struct to a non-empty struct, it can start by first migrating to a braced empty struct (and then have a tool indicate all of the locations where braces need to be added); after that step has been completed, one can then take the next step of adding the actual field.

Drawbacks

Some people like "There is only one way to do it." But, there is precendent in Rust for violating "one way to do it" in favor of syntactic convenience or regularity; see the Precedent for flexible syntax in Rust appendix. Also, see the Always Require Braces alternative below.

I have attempted to summarize the previous discussion from RFC PR 147 in the Recent History appendix; some of the points there include drawbacks to this approach and to the Always Require Braces alternative.

Alternatives

Always Require Braces

Alternative 1: "Always Require Braces". Specifically, require empty curly braces on empty structs. People who like the current syntax of curly-brace free structs can encode them this way: enum S0 { S0 } This would address all of the same issues outlined above. (Also, the author (pnkfelix) would be happy to take this tack.)

The main reason not to take this tack is that some people may like writing empty structs without braces, but do not want to switch to the unary enum version described in the previous paragraph. See "I wouldn't want to force noisier syntax ..." in the Recent History appendix.

Status quo

Alternative 2: Status quo. Macros and code-generators in general will need to handle empty structs as a special case. We may continue hitting bugs like CFG parse bug. Some users will be annoyed but most will probably cope.

Synonymous in all contexts

Alternative 3: An earlier version of this RFC proposed having struct S; be entirely synonymous with struct S { }, and the expression S { } be synonymous with S.

This was deemed problematic, since it would mean that S { } would put an entry into both the type and value namespaces, while S { x: int } would only put an entry into the type namespace. Thus the current draft of the RFC proposes the "flexible" versus "braced" distinction for empty structs.

Never synonymous

Alternative 4: Treat struct S; as requiring S at the expression and pattern sites, and struct S { } as requiring S { } at the expression and pattern sites.

This in some ways follows a principle of least surprise, but it also is really hard to justify having both syntaxes available for empty structs with no flexibility about how they are used. (Note again that one would have the option of choosing between enum S { S }, struct S;, or struct S { }, each with their own idiosyncrasies about whether you have to write S or S { }.) I would rather adopt "Always Require Braces" than "Never Synonymous"

Empty Tuple Structs

One might say "why are you including support for curly braces, but not parentheses?" Or in other words, "what about empty tuple structs?"

The code-generation argument could be applied to tuple-structs as well, to claim that we should allow the syntax S0(). I am less inclined to add a special case for that; I think tuple-structs are less frequently used (especially with many fields); they are largely for ad-hoc data such as newtype wrappers, not for code generators.

Note that we should not attempt to generalize this RFC as proposed to include tuple structs, i.e. so that given struct S0 {}, the expressions T0, T0 {}, and T0() would be synonymous. The reason is that given a tuple struct struct T2(int, int), the identifier T2 is already bound to a constructor function:

fn main() {
    #[deriving(Show)]
    struct T2(int, int);

    fn foo<S:std::fmt::Show>(f: |int, int| -> S) {
        println!("Hello from {} and {}", f(2,3), f(4,5));
    }
    foo(T2);
}

So if we were to attempt to generalize the leniency of this RFC to tuple structs, we would be in the unfortunate situation given struct T0(); of trying to treat T0 simultaneously as an instance of the struct and as a constructor function. So, the handling of empty structs proposed by this RFC does not generalize to tuple structs.

(Note that if we adopt alternative 1, Always Require Braces, then the issue of how tuple structs are handled is totally orthogonal -- we could add support for struct T0() as a distinct type from struct S0 {}, if we so wished, or leave it aside.)

Unresolved questions

None

Appendices

The CFG problem

A program like this works today:

fn main() {
    #[deriving(Show)]
    struct Svaries {
        x: int,
        y: int,

        #[cfg(zed)]
        z: int,
    }

    let s = match () {
        #[cfg(zed)]      _ => Svaries { x: 3, y: 4, z: 5 },
        #[cfg(not(zed))] _ => Svaries { x: 3, y: 4 },
    };
    println!("Hello from {}", s)
}

Observe what happens when one modifies the above just a bit:

    struct Svaries {
        #[cfg(eks)]
        x: int,
        #[cfg(why)]
        y: int,

        #[cfg(zed)]
        z: int,
    }

Now, certain cfg settings yield an empty struct, even though it is surrounded by braces. Today this leads to a CFG parse bug when one attempts to actually construct such a struct.

If we want to support situations like this properly, we will probably need to further extend the cfg attribute so that it can be placed before individual fields in a struct constructor, like this:

// You cannot do this today,
// but maybe in the future (after a different RFC)
let s = Svaries {
    #[cfg(eks)] x: 3,
    #[cfg(why)] y: 4,
    #[cfg(zed)] z: 5,
};

Supporting such a syntax consistently in the future should start today with allowing empty braces as legal code. (Strictly speaking, it is not necessary that we add support for empty braces at the parsing level to support this feature at the semantic level. But supporting empty-braces in the syntax still seems like the most consistent path to me.)

Ancient History

A parsing ambiguity was the original motivation for disallowing the syntax S {} in favor of S for constructing an instance of an empty struct. The ambiguity and various options for dealing with it were well documented on the rust-dev thread. Both syntaxes were simultaneously supported at the time.

In particular, at the time that mailing list thread was created, the code match match x {} ... would be parsed as match (x {}) ..., not as (match x {}) ... (see Rust PR 5137); likewise, if x {} would be parsed as an if-expression whose test component is the struct literal x {}. Thus, at the time of Rust PR 5137, if the input to a match or if was an identifier expression, one had to put parentheses around the identifier to force it to be interpreted as input to the match/if, and not as a struct constructor.

Of the options for resolving this discussed on the mailing list thread, the one selected (removing S {} construction expressions) was chosen as the most expedient option.

At that time, the option of "Place a parser restriction on those contexts where { terminates the expression and say that struct literals cannot appear there unless they are in parentheses." was explicitly not chosen, in favor of continuing to use the disambiguation rule in use at the time, namely that the presence of a label (e.g. S { a_label: ... }) was the way to distinguish a struct constructor from an identifier followed by a control block, and thus, "there must be one label."

Naturally, if the construction syntax were to be disallowed, it made sense to also remove the struct S {} declaration syntax.

Things have changed since the time of that mailing list thread; namely, we have now adopted the aforementioned parser restriction Rust RFC 25. (The text of RFC 25 does not explicitly address match, but we have effectively expanded it to include a curly-brace delimited block of match-arms in the definition of "block".) Today, one uses parentheses around struct literals in some contexts (such as for e in (S {x: 3}) { ... } or match (S {x: 3}) { ... }

Note that there was never an ambiguity for uses of struct S0 { } in item position. The issue was solely about expression position prior to the adoption of Rust RFC 25.

Precedent for flexible syntax in Rust

There is precendent in Rust for violating "one way to do it" in favor of syntactic convenience or regularity.

For example, one can often include an optional trailing comma, for example in: let x : &[int] = [3, 2, 1, ];.

One can also include redundant curly braces or parentheses, for example in:

println!("hi: {}", { if { x.len() > 2 } { ("whoa") } else { ("there") } });

One can even mix the two together when delimiting match arms:

    let z: int = match x {
        [3, 2] => { 3 }
        [3, 2, 1] => 2,
        _ => { 1 },
    };

We do have lints for some style violations (though none catch the cases above), but lints are different from fundamental language restrictions.

Recent history

There was a previous RFC PR that was effectively the same in spirit to this one. It was closed because it was not sufficient well fleshed out for further consideration by the core team. However, to save people the effort of reviewing the comments on that PR (and hopefully stave off potential bikeshedding on this PR), I here summarize the various viewpoints put forward on the comment thread there, and note for each one, whether that viewpoint would be addressed by this RFC (accept both syntaxes), by Always Require Braces, or by Status Quo.

Note that this list of comments is just meant to summarize the list of views; it does not attempt to reflect the number of commenters who agreed or disagreed with a particular point. (But since the RFC process is not a democracy, the number of commenters should not matter anyway.)

"+1" ==> Favors: This RFC (or potentially Always Require Braces; I think the content of RFC PR 147 shifted over time, so it is hard to interpret the "+1" comments now).
"I find let s = S0; jarring, think its an enum initially." ==> Favors: Always Require Braces
"Frequently start out with an empty struct and add fields as I need them." ==> Favors: This RFC or Always Require Braces
"Foo{} suggests is constructing something that it's not; all uses of the value Foo are indistinguishable from each other" ==> Favors: Status Quo
"I find it strange anyone would prefer let x = Foo{}; over let x = Foo;" ==> Favors Status Quo; strongly opposes Always Require Braces.
"I agree that 'instantiation-should-follow-declation', that is, structs declared ;, (), {} should only be instantiated [via] ;, (), { } respectively" ==> Opposes leniency of this RFC in that it allows expression to use include or omit {} on an empty struct, regardless of declaration form, and vice-versa.
"The code generation argument is reasonable, but I wouldn't want to force noisier syntax on all 'normal' code just to make macros work better." ==> Favors: This RFC