diff --git a/text/1909-unsized-rvalues.md b/text/1909-unsized-rvalues.md new file mode 100644 index 00000000000..ac51516c22f --- /dev/null +++ b/text/1909-unsized-rvalues.md @@ -0,0 +1,203 @@ +- Feature Name: unsized_locals +- Start Date: 2017-02-11 +- RFC PR: https://github.com/rust-lang/rfcs/pull/1909 +- Rust Issue: https://github.com/rust-lang/rust/issues/48055 + +# Summary +[summary]: #summary + +Allow for local variables, function arguments, and some expressions to have an unsized type, and implement it by storing the temporaries in variably-sized allocas. + +Have repeat expressions with a length that captures local variables be such an expression, returning an `[T]` slice. + +Provide some optimization guarantees that unnecessary temporaries will not create unnecessary allocas. + +# Motivation +[motivation]: #motivation + +There are 2 motivations for this RFC: + +1. Passing unsized values, such as trait objects, to functions by value is often desired. Currently, this must be done through a `Box` with an unnecessary allocation. + + One particularly common example is passing closures that consume their environment without using monomorphization. One would like for this code to work: + + ```Rust + fn takes_closure(f: FnOnce()) { f(); } + ``` + + But today you have to use a hack, such as taking a `Box>`. + +2. Allocating a runtime-sized variable on the stack is important for good performance in some use-cases - see RFC #1808, which this is intended to supersede. + +# Detailed design +[design]: #detailed-design + +## Unsized Rvalues - language + +Remove the rule that requires all locals and rvalues to have a sized type. Instead, require the following: + +1. The following expressions must always return a Sized type: + 1. Function calls, method calls, operator expressions + - implementing unsized return values for function calls would require the *called function* to do the alloca in our stack frame. + 2. ADT expressions + - see alternatives + 3. cast expressions + - this seems like an implementation simplicity thing. These can only be trivial casts. +2. The RHS of assignment expressions must always have a Sized type. + - Assigning an unsized type is impossible because we don't know how much memory is available at the destination. This applies to ExprAssign assignments and not to StmtLet let-statements. + +This also allows passing unsized values to functions, with the ABI being as if a `&move` pointer was passed (a `(by-move-data, extra)` pair). This also means that methods taking `self` by value are object-safe, though vtable shims are sometimes needed to translate the ABI (as the callee-side intentionally does not pass `extra` to the fn in the vtable, no vtable shim is needed if the vtable function already takes its argument indirectly). + +For example: + +```Rust +struct StringData { + len: usize, + data: [u8], +} + +fn foo(s1: Box, s2: Box, cond: bool) { + // this creates a VLA copy of either `s1.1` or `s2.1` on + // the stack. + let mut s = if cond { + s1.data + } else { + s2.data + }; + drop(s1); + drop(s2); + foo(s); +} + +fn example(f: for<'a> FnOnce(&'a X<'a>)) { + let x = X::new(); + f(x); // aka FnOnce::call_once(f, (x,)); +} +``` + +## VLA expressions + +Allow repeat expressions to capture variables from their surrounding environment. If a repeat expression captures such a variable, it has type `[T]` with the length being evaluated at run-time. If the repeat expression does not capture any variable, the length is evaluated at compile-time. For example: +```Rust +extern "C" { + fn random() -> usize; +} + +fn foo(n: usize) { + let x = [0u8; n]; // x: [u8] + let x = [0u8; n + (random() % 100)]; // x: [u8] + let x = [0u8; 42]; // x: [u8; 42], like today + let x = [0u8; random() % 100]; //~ ERROR constant evaluation error +} +``` +"captures a variable" - as in RFC #1558 - is used as the condition for making the return be `[T]` because it is simple, easy to understand, and introduces no type-checking complications. + +The last error message could have a user-helpful note, for example "extract the length to a local variable if you want a variable-length array". + +## Unsized Rvalues - MIR + +The way this is implemented in MIR is that operands, rvalues, and temporaries are allowed to be unsized. An unsized operand is always "by-ref". Unsized rvalues are either a `Use` or a `Repeat` and both can be translated easily. + +Unsized locals can never be reassigned within a scope. When first assigning to an unsized local, a stack allocation is made with the correct size. + +MIR construction remains unchanged. + +## Guaranteed Temporary Elision + +MIR likes to create lots of temporaries for OOE reason. We should optimize them out in a guaranteed way in these cases (FIXME: extend these guarantees to locals aka NRVO?). + +TODO: add description of problem & solution. + +# How We Teach This +[teach]: #how-we-teach-this + +Passing arguments to functions by value should not be too complicated to teach. I would like VLAs to be mentioned in the book. + +The "guaranteed temporary elimination" rules require more work to teach. It might be better to come up with new rules entirely. + +# Drawbacks +[drawbacks]: #drawbacks + +In Unsafe code, it is very easy to create unintended temporaries, such as in: +```Rust +unsafe fn poke(ptr: *mut [u8]) { /* .. */ } +unsafe fn foo(mut a: [u8]) { + let ptr: *mut [u8] = &mut a; + // here, `a` must be copied to a temporary, because + // `poke(ptr)` might access the original. + bar(a, poke(ptr)); +} +``` + +If we make `[u8]` be `Copy`, that would be even easier, because even uses of `poke(ptr);` after the function call could potentially access the supposedly-valid data behind `a`. + +And even if it is not as easy, it is possible to accidentally create temporaries in safe code. + +Unsized temporaries are dangerous - they can easily cause aborts through stack overflow. + +# Alternatives +[alternatives]: #alternatives + +## The bikeshed + +There are several alternative options for the VLA syntax. + +1. The RFC choice, `[t; φ]` has type `[T; φ]` if `φ` captures no variables and type `[T]` if φ captures a variable. + - pro: can be understood using "HIR"/resolution only. + - pro: requires no additional syntax. + - con: might be confusing at first glance. + - con: `[t; foo()]` requires the length to be extracted to a local. +2. The "permissive" choice: `[t; φ]` has type `[T; φ]` if `φ` is a constexpr, otherwise `[T]` + - pro: allows the most code + - pro: requires no additional syntax. + - con: depends on what is exactly a const expression. This is a big issue because that is both non-local and might change between rustc versions. +3. Use the expected type - `[t; φ]` has type `[T]` if it is evaluated in a context that expects that type (for example `[t; foo()]: [T]`) and `[T; _]` otherwise. + - pro: in most cases, very human-visible. + - pro: requires no additional syntax. + - con: relies on the notion of "expected type". While I think we *do* have to rely on that in the unsafe code semantics of `&foo` borrow expressions (as in, whether a borrow is treated as a "safe" or "unsafe" borrow - I'll write more details sometime), it might be better to not rely on expected types too much. +4. use an explicit syntax, for example `[t; virtual φ]`. + - bikeshed: exact syntax. + - pro: very explicit and visible. + - con: more syntax. +5. use an intrinsic, `std::intrinsics::repeat(t, n)` or something. + - pro: theoretically minimizes changes to the language. + - con: requires returning unsized values from intrinsics. + - con: unergonomic to use. + +## Unsized ADT Expressions + +Allowing unsized ADT expressions would make unsized structs constructible without using unsafe code, as in: +```Rust +let len_ = s.len(); +let p = Box::new(PascalString { + length: len_, + data: *s +}); +``` + +However, without some way to guarantee that this can be done without allocas, that might be a large footgun. + +## Copy Slices + +One somewhat-orthogonal proposal that came up was to make `Clone` (and therefore `Copy`) not depend on `Sized`, and to make `[u8]` be `Copy`, by moving the `Self: Sized` bound from the trait to the methods, i.e. using the following declaration: +```Rust +pub trait Clone { + fn clone(&self) -> Self where Self: Sized; + fn clone_from(&mut self, source: &Self) where Self: Sized { + // ... + } +} +``` + +That would be a backwards-compatability-breaking change, because today `T: Clone + ?Sized` (or of course `Self: Clone` in a trait context, with no implied `Self: Sized`) implies that `T: Sized`, but it might be that its impact is small enough to allow (and even if not, it might be worth it for Rust 2.0). + +# Unresolved questions +[unresolved]: #unresolved-questions + +How can we mitigate the risk of unintended unsized or large allocas? Note that the problem already exists today with large structs/arrays. A MIR lint against large/variable stack sizes would probably help users avoid these stack overflows. Do we want it in Clippy? rustc? + +How do we handle truely-unsized DSTs when we get them? They can theoretically be passed to functions, but they can never be put in temporaries. + +Accumulative allocas (aka `'fn` borrows) are beyond the scope of this RFC. + +See alternatives.