|
| 1 | +# Early and Late Bound Variables |
| 2 | + |
| 3 | +In Rust, item definitions (like `fn`) can often have generic parameters, which |
| 4 | +are always [_universally_ quantified][quant]. That is, if you have a function |
| 5 | +like |
| 6 | + |
| 7 | +```rust |
| 8 | +fn foo<T>(x: T) { } |
| 9 | +``` |
| 10 | + |
| 11 | +this function is defined "for all T" (not "for some specific T", which would be |
| 12 | +[_existentially_ quantified][quant]). |
| 13 | + |
| 14 | +[quant]: ./appendix/background.md#quantified |
| 15 | + |
| 16 | +While Rust *items* can be quantified over types, lifetimes, and constants, the |
| 17 | +types of values in Rust are only ever quantified over lifetimes. So you can |
| 18 | +have a type like `for<'a> fn(&'a u32)`, which represents a function pointer |
| 19 | +that takes a reference with any lifetime, or `for<'a> dyn Trait<'a>`, which is |
| 20 | +a `dyn` trait for a trait implemented for any lifetime; but we have no type |
| 21 | +like `for<T> fn(T)`, which would be a function that takes a value of *any type* |
| 22 | +as a parameter. This is a consequence of monomorphization -- to support a value |
| 23 | +of type `for<T> fn(T)`, we would need a single function pointer that can be |
| 24 | +used for a parameter of any type, but in Rust we generate customized code for |
| 25 | +each parameter type. |
| 26 | + |
| 27 | +One consequence of this asymmetry is a weird split in how we represesent some |
| 28 | +generic types: _early-_ and _late-_ bound parameters. |
| 29 | +Basically, if we cannot represent a type (e.g. a universally quantified type), |
| 30 | +we have to bind it _early_ so that the unrepresentable type is never around. |
| 31 | + |
| 32 | +Consider the following example: |
| 33 | + |
| 34 | +```rust,ignore |
| 35 | +fn foo<'a, 'b, T>(x: &'a u32, y: &'b T) where T: 'b { ... } |
| 36 | +``` |
| 37 | + |
| 38 | +We cannot treat `'a`, `'b`, and `T` in the same way. Types in Rust can't have |
| 39 | +`for<T> { .. }`, only `for<'a> {...}`, so whenever you reference `foo` the type |
| 40 | +you get back can't be `for<'a, 'b, T> fn(&'a u32, y: &'b T)`. Instead, the `T` |
| 41 | +must be substituted early. In particular, you have: |
| 42 | + |
| 43 | +```rust,ignore |
| 44 | +let x = foo; // T, 'b have to be substituted here |
| 45 | +x(...); // 'a substituted here, at the point of call |
| 46 | +x(...); // 'a substituted here with a different value |
| 47 | +``` |
| 48 | + |
| 49 | +## Early-bound parameters |
| 50 | + |
| 51 | +Early-bound parameters in rustc are identified by an index, stored in the |
| 52 | +[`ParamTy`] struct for types or the [`EarlyBoundRegion`] struct for lifetimes. |
| 53 | +The index counts from the outermost declaration in scope. This means that as you |
| 54 | +add more binders inside, the index doesn't change. |
| 55 | + |
| 56 | +For example, |
| 57 | + |
| 58 | +```rust,ignore |
| 59 | +trait Foo<T> { |
| 60 | + type Bar<U> = (Self, T, U); |
| 61 | +} |
| 62 | +``` |
| 63 | + |
| 64 | +Here, the type `(Self, T, U)` would be `($0, $1, $2)`, where `$N` means a |
| 65 | +[`ParamTy`] with the index of `N`. |
| 66 | + |
| 67 | +In rustc, the [`Generics`] structure carries the this information. So the |
| 68 | +[`Generics`] for `Bar` above would be just like for `U` and would indicate the |
| 69 | +'parent' generics of `Foo`, which declares `Self` and `T`. You can read more |
| 70 | +in [this chapter](./generics.md). |
| 71 | + |
| 72 | +[`ParamTy`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.ParamTy.html |
| 73 | +[`EarlyBoundRegion`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.EarlyBoundRegion.html |
| 74 | +[`Generics`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Generics.html |
| 75 | + |
| 76 | +## Late-bound parameters |
| 77 | + |
| 78 | +Late-bound parameters in `rustc` are handled quite differently (they are also |
| 79 | +specialized to lifetimes since, right now, only late-bound lifetimes are |
| 80 | +supported, though with GATs that has to change). We indicate their potential |
| 81 | +presence by a [`Binder`] type. The [`Binder`] doesn't know how many variables |
| 82 | +there are at that binding level. This can only be determined by walking the |
| 83 | +type itself and collecting them. So a type like `for<'a, 'b> ('a, 'b)` would be |
| 84 | +`for (^0.a, ^0.b)`. Here, we just write `for` because we don't know the names |
| 85 | +of the things bound within. |
| 86 | + |
| 87 | +Moreover, a reference to a late-bound lifetime is written `^0.a`: |
| 88 | + |
| 89 | +- The `0` is the index; it identifies that this lifetime is bound in the |
| 90 | + innermost binder (the `for`). |
| 91 | +- The `a` is the "name"; late-bound lifetimes in rustc are identified by a |
| 92 | + "name" -- the [`BoundRegion`] struct. This struct can contain a |
| 93 | + [`DefId`][defid] or it might have various "anonymous" numbered names. The |
| 94 | + latter arise from types like `fn(&u32, &u32)`, which are equivalent to |
| 95 | + something like `for<'a, 'b> fn(&'a u32, &'b u32)`, but the names of those |
| 96 | + lifetimes must be generated. |
| 97 | + |
| 98 | +This setup of not knowing the full set of variables at a binding level has some |
| 99 | +advantages and some disadvantages. The disadvantage is that you must walk the |
| 100 | +type to find out what is bound at the given level and so forth. The advantage |
| 101 | +is primarily that, when constructing types from Rust syntax, if we encounter |
| 102 | +anonymous regions like in `fn(&u32)`, we just create a fresh index and don't have |
| 103 | +to update the binder. |
| 104 | + |
| 105 | +[`Binder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Binder.html |
| 106 | +[`BoundRegion`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundRegion.html |
| 107 | +[defid]: ./hir.html#identifiers-in-the-hir |
0 commit comments