lang (typesystem | dst)
Allow for local variables, function arguments, and some expressions to have an unsized type, and implement it by storing the temporaries in variably-sized allocas.
Have repeat expressions with a length that captures local variables be such an expression, returning an [T]
slice.
Provide some optimization guarantees that unnecessary temporaries will not create unnecessary allocas.
There are 2 motivations for this RFC:
Box<T>
with an unnecessary allocation.One particularly common example is passing closures that consume their environment without using monomorphization. One would like for this code to work:
fn takes_closure(f: FnOnce()) { f(); }
But today you have to use a hack, such as taking a Box<FnBox<()>>
.
Remove the rule that requires all locals and rvalues to have a sized type. Instead, require the following:
This also allows passing unsized values to functions, with the ABI being as if a &move
pointer was passed (a (by-move-data, extra)
pair). This also means that methods taking self
by value are object-safe, though vtable shims are sometimes needed to translate the ABI (as the callee-side intentionally does not pass extra
to the fn in the vtable, no vtable shim is needed if the vtable function already takes its argument indirectly).
For example:
struct StringData {
len: usize,
data: [u8],
}
fn foo(s1: Box<StringData>, s2: Box<StringData>, cond: bool) {
// this creates a VLA copy of either `s1.1` or `s2.1` on
// the stack.
let mut s = if cond {
s1.data
} else {
s2.data
};
drop(s1);
drop(s2);
foo(s);
}
fn example(f: for<'a> FnOnce(&'a X<'a>)) {
let x = X::new();
f(x); // aka FnOnce::call_once(f, (x,));
}
Allow repeat expressions to capture variables from their surrounding environment. If a repeat expression captures such a variable, it has type [T]
with the length being evaluated at run-time. If the repeat expression does not capture any variable, the length is evaluated at compile-time. For example:
extern "C" {
fn random() -> usize;
}
fn foo(n: usize) {
let x = [0u8; n]; // x: [u8]
let x = [0u8; n + (random() % 100)]; // x: [u8]
let x = [0u8; 42]; // x: [u8; 42], like today
let x = [0u8; random() % 100]; //~ ERROR constant evaluation error
}
"captures a variable" - as in RFC #1558 - is used as the condition for making the return be [T]
because it is simple, easy to understand, and introduces no type-checking complications.
The last error message could have a user-helpful note, for example "extract the length to a local variable if you want a variable-length array".
The way this is implemented in MIR is that operands, rvalues, and temporaries are allowed to be unsized. An unsized operand is always "by-ref". Unsized rvalues are either a Use
or a Repeat
and both can be translated easily.
Unsized locals can never be reassigned within a scope. When first assigning to an unsized local, a stack allocation is made with the correct size.
MIR construction remains unchanged.
MIR likes to create lots of temporaries for OOE reason. We should optimize them out in a guaranteed way in these cases (FIXME: extend these guarantees to locals aka NRVO?).
TODO: add description of problem & solution.
Passing arguments to functions by value should not be too complicated to teach. I would like VLAs to be mentioned in the book.
The "guaranteed temporary elimination" rules require more work to teach. It might be better to come up with new rules entirely.
In Unsafe code, it is very easy to create unintended temporaries, such as in:
unsafe fn poke(ptr: *mut [u8]) { /* .. */ }
unsafe fn foo(mut a: [u8]) {
let ptr: *mut [u8] = &mut a;
// here, `a` must be copied to a temporary, because
// `poke(ptr)` might access the original.
bar(a, poke(ptr));
}
If we make [u8]
be Copy
, that would be even easier, because even uses of poke(ptr);
after the function call could potentially access the supposedly-valid data behind a
.
And even if it is not as easy, it is possible to accidentally create temporaries in safe code.
Unsized temporaries are dangerous - they can easily cause aborts through stack overflow.
There are several alternative options for the VLA syntax.
[t; φ]
has type [T; φ]
if φ
captures no variables and type [T]
if φ captures a variable.
[t; foo()]
requires the length to be extracted to a local.[t; φ]
has type [T; φ]
if φ
is a constexpr, otherwise [T]
[t; φ]
has type [T]
if it is evaluated in a context that expects that type (for example [t; foo()]: [T]
) and [T; _]
otherwise.
&foo
borrow expressions (as in, whether a borrow is treated as a "safe" or "unsafe" borrow - I'll write more details sometime), it might be better to not rely on expected types too much.[t; virtual φ]
.
std::intrinsics::repeat(t, n)
or something.
Allowing unsized ADT expressions would make unsized structs constructible without using unsafe code, as in:
let len_ = s.len();
let p = Box::new(PascalString {
length: len_,
data: *s
});
However, without some way to guarantee that this can be done without allocas, that might be a large footgun.
One somewhat-orthogonal proposal that came up was to make Clone
(and therefore Copy
) not depend on Sized
, and to make [u8]
be Copy
, by moving the Self: Sized
bound from the trait to the methods, i.e. using the following declaration:
pub trait Clone {
fn clone(&self) -> Self where Self: Sized;
fn clone_from(&mut self, source: &Self) where Self: Sized {
// ...
}
}
That would be a backwards-compatability-breaking change, because today T: Clone + ?Sized
(or of course Self: Clone
in a trait context, with no implied Self: Sized
) implies that T: Sized
, but it might be that its impact is small enough to allow (and even if not, it might be worth it for Rust 2.0).
How can we mitigate the risk of unintended unsized or large allocas? Note that the problem already exists today with large structs/arrays. A MIR lint against large/variable stack sizes would probably help users avoid these stack overflows. Do we want it in Clippy? rustc?
How do we handle truely-unsized DSTs when we get them? They can theoretically be passed to functions, but they can never be put in temporaries.
Accumulative allocas (aka 'fn
borrows) are beyond the scope of this RFC.
See alternatives.