diff --git a/text/0000-stmtize-looping-forms.md b/text/0000-stmtize-looping-forms.md new file mode 100644 index 00000000000..ced0082211f --- /dev/null +++ b/text/0000-stmtize-looping-forms.md @@ -0,0 +1,371 @@ +- Feature Name: looping_expr_forms +- Start Date: 2015-03-08 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + +Restrict grammar of Rust language for 1.0 so that all looping +syntactic forms (`for`, `loop`, `while`, and `while let`) are +statements (instead of expressions). Forms like `let d = loop { };` and +`(while foo() { })` all produce errors at parse time. + +The errors also provide a note saying that one can write `{ + }` in the expression's original context to get the +program building again. + +The feature name above (`looping_expr_forms`) is used to restore +support for parsing them as expressions, with the understanding that +the semantics of looping forms in expression context is unstable. + + +# Motivation + +[RFC PR 352] ("Allow loops to return values other than ()") proposes +extending the syntax of the looping forms so that they can be used to +return non-unit values. + + * (In a nutshell, the syntax changes of [RFC PR 352] use `break + ` to indicate the value to return. Since generally + `for`, `while`, and `while let` loops do not terminate via `break`, + it also adds the suffix form ` { } + else { }` to such loops that return non-unit values. + However, these details are *not* the subject of this RFC.) + +The changes that [RFC PR 352] proposes are backwards compatible. +However, a number of commenters did not see the proposed extension as +a natural fit for the Rust language, and some commenters suggested +alternative approaches that might work better. + +[RFC PR 352]: https://github.com/rust-lang/rfcs/pull/352 + +This document is *not* assigning any judgment about whether non-unit +looping expressions should or should not be adopted. It is merely +proposing that we can increase the range of variants of non-unit +looping expressions that can be added backwards-compatibly if we start +by first classifying all of today's looping forms as statements +rather than expressions. + +In other words, this document is proposing that by imposing a (*very +small*) restriction today, we avoid painting ourselves into a corner +with respect to how we handle [RFC PR 352] tomorrow. + +# Detailed design + +Change the parser so that the existing looping forms (`for`, `loop`, +`while`, `while let`) can be parsed in statement context, and remove +the ability to parse those forms in expression context (unless the +feature `looping_expr_forms` is turned on). + +In a context that can accept either a statement or an expression +(e.g. the final form in a block expression), disambiguate by always +choosing the statement form of a loop rather than the expression form. +(For discussion of this point, see [Does this change accomplish nothing].) + +## The grammar + +In the current grammar, I believe the effect of the above paragraph +can be roughly summarized as: + +In the grammatical production for ``, add the cases +``` + ::= ... + | + | + | + | + + ::= + ::= + ::= + ::= +``` + +In the grammatical production for ``, +``` + ::= ... + | + | + | + | +``` +conditionalize the latter four productions so that they are +only present when the feature `looping_expr_forms` is turned on. +(I am not sufficiently familiar with bison grammar specifications to +know how to encode the statement/expression context disambiguation +rule.) + +(The main reason I want to continue supporting the expression forms is +to encourage the compiler developers to continue thinking of the +looping forms as expressions, even if the stable language only exposes +them as statements. If we remove support for parsing them as +expressions, then it will be harder to re-add it later.) + +## Examples + +Most important: Much Rust code will continue to compile without +needing any modification under the change proposed here. In particular, +in the prototype developed by pnkfelix, only 17 files needed to be +changed after the parser itself had been updated. *Seventeen* -- that's +*including* the test suite. + +You can see the [prototype fallout], as well as the [prototype parser changes] pnkfelix +used to emulate this restriction to the language. + +[prototype fallout]: https://github.com/pnkfelix/rust/compare/c1e2151e2b5fd88c4d335a1671c373845b95a675...32c5726452ff875694c4aed81b87206faadcb640 + +[prototype parser changes]: https://github.com/pnkfelix/rust/commit/c1e2151e2b5fd88c4d335a1671c373845b95a675 + +Here are some examples of expressions that will start to +fail to compile: + +```rust +// cannot use stmt as RHS of a let +let a = while foo() { }; + +// cannot use stmt directly as body of arm +match b { 1 => (), _ => loop { }, } + +// cannot use stmt as content of paren expr +let c = { hi(); (for i in 0.. { }); bye() }; + +// cannot use stmt as argument to function call +hello(while let Some(x) = i.next() { }); +``` + +However, *any* code that does start to fail to compile can be fixed by +wrapping the former expression within a trivial block expression: + +```rust +let a = { while foo() { } }; + +match b { 1 => (), _ => { loop { } } } + +// (this is silly, of course; +// but wasn't it silly already? -- well, see next section.) +let c = { hi(); {(for i in 0.. { })}; bye() }; + +hello({while let Some(x) = i.next() { }}); +``` + +The reason that much code continues to compile without requiring any +modification is that all of the looping forms currently have type +`()`, and thus the change to parsing them as statements in the context +of a block has no effect in the end. + + +## Does this change accomplish nothing? +[Does this change accomplish nothing]: #does-this-change-accomplish-nothing + +The above text introduces the block expression workaround +(which I will call just the "block workaround") of replacing +e.g. `let d: () = loop { };` with `let d: () = { loop { } };`. +One might reasonably wonder: +Is it not true that the block workaround proves that this change is +accomplishing nothing, at least if the goal is to maximize flexibility +for the future? + +As a concrete example: let us assume, for the sake of argument, that +some future core team decides that the looping expression forms should +return `Option` instead of `()`. But then consider code using the +block workaround: once support for the new looping expression forms is +turned on, won't the `{ loop { } }` block treat the inner `loop { }` +no longer as a statement, but instead as a tail expression in the +block? And therefore the code `let d: () = { loop { } };` would start +to fail to compile (since the return type of `loop { }` is some form +of `Option<_>`, which will not unify with the unit type `()` that has +been explicitly ascribed to the let-binding for `d`. + +The mistake in the above reasoning is that it overlooks the detail in +this proposal that in the final form in a block expression, this RFC +states that we disambiguate by always choosing the statement form of a +loop rather than the expression form. Thus, `{ loop { } }` would +continue to have the type `()`. In the context of the final form in a +block, the `loop { }` is a statement unless it is e.g. wrapped in +parentheses to force it to be an expression, concretely: + +```rust +let d: Option<_> = { ...; (loop { }) }; +``` + +If the use of parentheses here seems overly strange to you, +take heart in the fact that there is precedent, sort of; +see the appendix [Precedent for proposed disambiguation]. + +# Drawbacks + +It is a restriction designed to accommodate a hypothesized variant of +the non-unit looping expressions feature, but the language may never +adopt the feature; furthermore, it is possible to adopt the feature +without requiring this restriction, as described in [RFC PR 352] as +written. + +# Alternatives + + * Obvious 1: We could jump straight to adopting one of the non-unit + looping expressions variants. + + * Obvious 2: We could decide that non-unit looping expressions are + only worth adopting if they can be added without requiring a change + like this. + + Okay, now to less obvious. + + * "Change only for/while": We could restrict *just* `for`, `while`, + and `while let` to + statement forms, but leave `loop { ... }` as an expression. + + A justification for this is that there are more issues to resolve + for the former three forms (namely in terms of how to deal with + control-flow that does not hit a `break `). + + But `loop { ... }` is really simple. Its control flow never exits + to its immediate expression context, except via `break`, and thus + one can justify leaving `loop` as an expression form, and just + dictate that any future additions for looping expression will + have to accommodate it (presumably in a manner similar to that used + by [RFC PR 352], i.e. `break` with no attached `` is sugar + for `break ()`). + + Adopting this variant would tie our hands slightly, but not + too much, and would actually remove much of the (already + tiny) [prototype fallout], since many of the cases that + had to be updated were uses of `let d = loop { };` + within test code exercising the type system + (i.e. essentially as synonymous for `let d = unimplemented!();`). + + * (From [Ericson2314 comment](https://github.com/rust-lang/rfcs/pull/955#issuecomment-77790727)): "Disambiguate block-tail as expression, not statement.": + This would make `{ }` continue to parse as a block + with no statements and a single tail expression; it would thus be + rejected unless the `looping_expr_forms` feature gate were enabled. + One would have to add a semicolon, `{ ; }`, in order + to get one's code compiling in the short term without the feature gate. + + However, we would also apply to looping-forms whatever magic we + use in the parser to favor eagerly parsing if- and block-forms as + statements. In particular: we would have the following + + ```rust + // IF-EXPR: this is what happens *today* + { if false {} else {} - 3 } // ==> -3 + { (if false {} else {} - 3) } // type error + { if false { 3 } else { 3 } } // ==> 3 + { if false { 3 } else { 3 }; } // ==> () + + // BLOCK-EXPR: this is what happens *today* + { { } - 3 } // ==> -3 + { ({ } - 3) } // type error + { { 3 } } // ==> 3 + { { 3 }; } // ==> () + + // LOOPING-EXPR: this is what this alternative proposes + { loop { break } - 3 } // ==> -3 + { (loop { break } - 3) } // type error + { loop { break } } // syntax error (*for now*) + { loop { break }; } // ==> (), now and forever after + + // compare the latter against: + // LOOPING-EXPR: this is what the RFC as written proposes + { loop { break } - 3 } // ==> -3 + { (loop { break } - 3) } // type error + { loop { break } } // ==> (), now and forever after + { loop { break }; } // ==> () + ``` + + The motivation for this alternative arises from the case labeled + "syntax error (*for now*)" in the code fragment above. + The idea is that having this in place today would set the stage + for supporting `break ` in the future that works even + for looping forms that occur directly in block tail position, + as shown here: + + ```rust + // LOOPING-EXPR: this is what this alternative proposes for future + { loop { break 4 } - 3 } // ==> -3 + { (loop { break 4 } - 3) } // ==> 1 + { loop { break 3 } } // ==> 3 // hard to accommodate under RFC as written + { loop { break 3 }; } // ==> () + ``` + + Advantage: This alternative approach would make the analogy + presented in [Precedent for proposed disambiguation] much + stronger, which may make the whole system more coherent and easier + to explain overall. + + Drawback: It probably also would require significantly more + changes to existing code. (I do not yet have an estimate of the + amount of changes required to the rust repo itself; if I have a + chance to amend my prototype I will do so.) + +# Unresolved questions + +None. + +# Appendices + +## Precedent for proposed disambiguation +[Precedent for proposed disambiguation]: #precedent-for-proposed-disambiguation + +The statement/expression ambiguity at the end of a block expression +discussed above, and this same manner of disambiguation, has precedent +in Rust. Sort of. + +In particular, an `if`-expression followed by minus sign and an +expression could be interpreted as either a statement followed by a +unary negation, or as a single subtraction with the `if`-expression as +its [minuend]. Just as in this RFC, the situation with +`if`-expressions is disambiguated by treating it as a statement +followed by a unary negation. + +[minuend]: http://en.wikipedia.org/wiki/Subtraction#Notation_and_terminology + +Here is a concrete example of this phenomenon. + +```rust +// (not using Default so that we can differentiate negation from minus +// expression) +trait Def { fn def() -> Self; } +impl Def for () { fn def() -> () { () } } +impl Def for i32 { fn def() -> i32 { -1 } } + +fn h(context: &str) -> D { + let d = Def::def(); + println!("h from {} returns {:?}", context, d); + d +} + +fn g(t: bool) -> i32 { + (if t { h("g true") } else { h("g false") } - 3) +} +// ^~~~~~~~~~~~~~~ if-expression returning -1 +// +// -1 - 3 ==> -4 + +fn f(t: bool) -> i32 { + if t { h("f true") } else { h("f true") } - 3 +} +// ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ statement +// (nested if is forced to have type `()`) +// +// (); -3 ==> -3 + +fn main() { + let x = f(true); + let y = g(false); println!("x: {} y: {}", x, y); +} +``` + +The above code prints + +``` +h from f true returns () +h from g false returns -1 +x: -3 y: -4 +``` + +However, I am drawing somewhat of a false analogy here. Namely, an +if-expression as the last form in a block will today be parsed as an +expression; this RFC is written under the expectation that if we do +adopt a form of non-unit looping expressions that require non-unit +return type, then we probably will require the use of parentheses +around such forms when they appear as the tail expression of a block.