Skip to content

Commit d253f98

Browse files
authored
Merge pull request #190 from nikomatsakis/mir-borrow-check-1
start to document MIR borrow check
2 parents 1fa5685 + a628418 commit d253f98

File tree

10 files changed

+262
-67
lines changed

10 files changed

+262
-67
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ cache:
33
- cargo
44
before_install:
55
- shopt -s globstar
6-
- MAX_LINE_LENGTH=80 bash ci/check_line_lengths.sh src/**/*.md
6+
- MAX_LINE_LENGTH=100 bash ci/check_line_lengths.sh src/**/*.md
77
install:
88
- source ~/.cargo/env || true
99
- bash ci/install.sh

ci/check_line_lengths.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
if [ "$1" == "--help" ]; then
44
echo 'Usage:'
5-
echo ' MAX_LINE_LENGTH=80' "$0" 'src/**/*.md'
5+
echo ' MAX_LINE_LENGTH=100' "$0" 'src/**/*.md'
66
exit 1
77
fi
88

src/SUMMARY.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,12 @@
5353
- [MIR construction](./mir/construction.md)
5454
- [MIR visitor and traversal](./mir/visitor.md)
5555
- [MIR passes: getting the MIR for a function](./mir/passes.md)
56-
- [MIR borrowck](./mir/borrowck.md)
57-
- [MIR-based region checking (NLL)](./mir/regionck.md)
5856
- [MIR optimizations](./mir/optimizations.md)
57+
- [The borrow checker](./borrow_check.md)
58+
- [Tracking moves and initialization](./borrow_check/moves_and_initialization.md)
59+
- [Move paths](./borrow_check/moves_and_initialization/move_paths.md)
60+
- [MIR type checker](./borrow_check/type_check.md)
61+
- [Region inference](./borrow_check/region_inference.md)
5962
- [Constant evaluation](./const-eval.md)
6063
- [miri const evaluator](./miri.md)
6164
- [Parameter Environments](./param_env.md)

src/appendix/glossary.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ MIR | the Mid-level IR that is created after type-checking
4040
miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html))
4141
normalize | a general term for converting to a more canonical form, but in the case of rustc typically refers to [associated type normalization](./traits/associated-types.html#normalize)
4242
newtype | a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices.
43-
NLL | [non-lexical lifetimes](./mir/regionck.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
43+
NLL | [non-lexical lifetimes](./borrow_check/region_inference.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
4444
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
4545
obligation | something that must be proven by the trait system ([see more](traits/resolution.html))
4646
projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](./traits/goals-and-clauses.html#trait-ref)
@@ -53,7 +53,7 @@ rib | a data structure in the name resolver that keeps trac
5353
sess | the compiler session, which stores global data used throughout compilation
5454
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
5555
sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references.
56-
skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./mir/regionck.html#skol) for more details.
56+
skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./borrow_check/region_inference.html#skol) for more details.
5757
soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness").
5858
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
5959
substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap<i32, u32>`)

src/borrow_check.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# MIR borrow check
2+
3+
The borrow check is Rust's "secret sauce" – it is tasked with
4+
enforcing a number of properties:
5+
6+
- That all variables are initialized before they are used.
7+
- That you can't move the same value twice.
8+
- That you can't move a value while it is borrowed.
9+
- That you can't access a place while it is mutably borrowed (except through
10+
the reference).
11+
- That you can't mutate a place while it is shared borrowed.
12+
- etc
13+
14+
At the time of this writing, the code is in a state of transition. The
15+
"main" borrow checker still works by processing [the HIR](hir.html),
16+
but that is being phased out in favor of the MIR-based borrow checker.
17+
Accordingly, this documentation focuses on the new, MIR-based borrow
18+
checker.
19+
20+
Doing borrow checking on MIR has several advantages:
21+
22+
- The MIR is *far* less complex than the HIR; the radical desugaring
23+
helps prevent bugs in the borrow checker. (If you're curious, you
24+
can see
25+
[a list of bugs that the MIR-based borrow checker fixes here][47366].)
26+
- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll],
27+
which are regions derived from the control-flow graph.
28+
29+
[47366]: https://github.com/rust-lang/rust/issues/47366
30+
[nll]: http://rust-lang.github.io/rfcs/2094-nll.html
31+
32+
### Major phases of the borrow checker
33+
34+
The borrow checker source is found in
35+
[the `rustc_mir::borrow_check` module][b_c]. The main entry point is
36+
the [`mir_borrowck`] query.
37+
38+
[b_c]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html
39+
[`mir_borrowck`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/fn.mir_borrowck.html
40+
41+
- We first create a **local copy** of the MIR. In the coming steps,
42+
we will modify this copy in place to modify the types and things to
43+
include references to the new regions that we are computing.
44+
- We then invoke [`replace_regions_in_mir`] to modify our local MIR.
45+
Among other things, this function will replace all of the [regions](./appendix/glossary.html) in
46+
the MIR with fresh [inference variables](./appendix/glossary.html).
47+
- Next, we perform a number of
48+
[dataflow analyses](./appendix/background.html#dataflow) that
49+
compute what data is moved and when.
50+
- We then do a [second type check](borrow_check/type_check.html) across the MIR:
51+
the purpose of this type check is to determine all of the constraints between
52+
different regions.
53+
- Next, we do [region inference](borrow_check/region_inference.html), which computes
54+
the values of each region — basically, points in the control-flow graph.
55+
- At this point, we can compute the "borrows in scope" at each point.
56+
- Finally, we do a second walk over the MIR, looking at the actions it
57+
does and reporting errors. For example, if we see a statement like
58+
`*a + 1`, then we would check that the variable `a` is initialized
59+
and that it is not mutably borrowed, as either of those would
60+
require an error to be reported.
61+
- Doing this check requires the results of all the previous analyses.
62+
63+
[`replace_regions_in_mir`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/fn.replace_regions_in_mir.html
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Tracking moves and initialization
2+
3+
Part of the borrow checker's job is to track which variables are
4+
"initialized" at any given point in time -- this also requires
5+
figuring out where moves occur and tracking those.
6+
7+
## Initialization and moves
8+
9+
From a user's perspective, initialization -- giving a variable some
10+
value -- and moves -- transfering ownership to another place -- might
11+
seem like distinct topics. Indeed, our borrow checker error messages
12+
often talk about them differently. But **within the borrow checker**,
13+
they are not nearly as separate. Roughly speaking, the borrow checker
14+
tracks the set of "initialized places" at any point in the source
15+
code. Assigning to a previously uninitialized local variable adds it
16+
to that set; moving from a local variable removes it from that set.
17+
18+
Consider this example:
19+
20+
```rust,ignore
21+
fn foo() {
22+
let a: Vec<u32>;
23+
24+
// a is not initialized yet
25+
26+
a = vec![22];
27+
28+
// a is initialized here
29+
30+
std::mem::drop(a); // a is moved here
31+
32+
// a is no longer initialized here
33+
34+
let l = a.len(); //~ ERROR
35+
}
36+
```
37+
38+
Here you can see that `a` starts off as uninitialized; once it is
39+
assigned, it becomes initialized. But when `drop(a)` is called, that
40+
moves `a` into the call, and hence it becomes uninitialized again.
41+
42+
## Subsections
43+
44+
To make it easier to peruse, this section is broken into a number of
45+
subsections:
46+
47+
- [Move paths](./moves_and_initialization/move_paths.html the
48+
*move path* concept that we use to track which local variables (or parts of
49+
local variables, in some cases) are initialized.
50+
- TODO *Rest not yet written* =)
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Move paths
2+
3+
In reality, it's not enough to track initialization at the granularity
4+
of local variables. Rust also allows us to do moves and initialization
5+
at the field granularity:
6+
7+
```rust,ignore
8+
fn foo() {
9+
let a: (Vec<u32>, Vec<u32>) = (vec![22], vec![44]);
10+
11+
// a.0 and a.1 are both initialized
12+
13+
let b = a.0; // moves a.0
14+
15+
// a.0 is not initializd, but a.1 still is
16+
17+
let c = a.0; // ERROR
18+
let d = a.1; // OK
19+
}
20+
```
21+
22+
To handle this, we track initialization at the granularity of a **move
23+
path**. A [`MovePath`] represents some location that the user can
24+
initialize, move, etc. So e.g. there is a move-path representing the
25+
local variable `a`, and there is a move-path representing `a.0`. Move
26+
paths roughly correspond to the concept of a [`Place`] from MIR, but
27+
they are indexed in ways that enable us to do move analysis more
28+
efficiently.
29+
30+
[`MovePath`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html
31+
[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html
32+
33+
## Move path indices
34+
35+
Although there is a [`MovePath`] data structure, they are never
36+
referenced directly. Instead, all the code passes around *indices* of
37+
type
38+
[`MovePathIndex`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/indexes/struct.MovePathIndex.html). If
39+
you need to get information about a move path, you use this index with
40+
the [`move_paths` field of the `MoveData`][move_paths]. For example,
41+
to convert a [`MovePathIndex`] `mpi` into a MIR [`Place`], you might
42+
access the [`MovePath::place`] field like so:
43+
44+
```rust,ignore
45+
move_data.move_paths[mpi].place
46+
```
47+
48+
[move_paths]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.move_paths
49+
[`MovePath::place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html#structfield.place
50+
51+
## Building move paths
52+
53+
One of the first things we do in the MIR borrow check is to construct
54+
the set of move paths. This is done as part of the
55+
[`MoveData::gather_moves`] function. This function uses a MIR visitor
56+
called [`Gatherer`] to walk the MIR and look at how each [`Place`]
57+
within is accessed. For each such [`Place`], it constructs a
58+
corresponding [`MovePathIndex`]. It also records when/where that
59+
particular move path is moved/initialized, but we'll get to that in a
60+
later section.
61+
62+
[`Gatherer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html
63+
[`MoveData::gather_moves`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#method.gather_moves
64+
65+
### Illegal move paths
66+
67+
We don't actually create a move-path for **every** [`Place`] that gets
68+
used. In particular, if it is illegal to move from a [`Place`], then
69+
there is no need for a [`MovePathIndex`]. Some examples:
70+
71+
- You cannot move from a static variable, so we do not create a [`MovePathIndex`]
72+
for static variables.
73+
- You cannot move an individual element of an array, so if we have e.g. `foo: [String; 3]`,
74+
there would be no move-path for `foo[1]`.
75+
- You cannot move from inside of a borrowed reference, so if we have e.g. `foo: &String`,
76+
there would be no move-path for `*foo`.
77+
78+
These rules are enforced by the [`move_path_for`] function, which
79+
converts a [`Place`] into a [`MovePathIndex`] -- in error cases like
80+
those just discussed, the function returns an `Err`. This in turn
81+
means we don't have to bother tracking whether those places are
82+
initialized (which lowers overhead).
83+
84+
[`move_path_for`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html#method.move_path_for
85+
86+
## Looking up a move-path
87+
88+
If you have a [`Place`] and you would like to convert it to a [`MovePathIndex`], you
89+
can do that using the [`MovePathLookup`] structure found in the [`rev_lookup`] field
90+
of [`MoveData`]. There are two different methods:
91+
92+
[`MovePathLookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html
93+
[`rev_lookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.rev_lookup
94+
95+
- [`find_local`], which takes a [`mir::Local`] representing a local
96+
variable. This is the easier method, because we **always** create a
97+
[`MovePathIndex`] for every local variable.
98+
- [`find`], which takes an arbitrary [`Place`]. This method is a bit
99+
more annoying to use, precisely because we don't have a
100+
[`MovePathIndex`] for **every** [`Place`] (as we just discussed in
101+
the "illegal move paths" section). Therefore, [`find`] returns a
102+
[`LookupResult`] indicating the closest path it was able to find
103+
that exists (e.g., for `foo[1]`, it might return just the path for
104+
`foo`).
105+
106+
[`find`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find
107+
[`find_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find_local
108+
[`mir::Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/struct.Local.html
109+
[`LookupResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/enum.LookupResult.html
110+
111+
## Cross-references
112+
113+
As we noted above, move-paths are stored in a big vector and
114+
referenced via their [`MovePathIndex`]. However, within this vector,
115+
they are also structured into a tree. So for example if you have the
116+
[`MovePathIndex`] for `a.b.c`, you can go to its parent move-path
117+
`a.b`. You can also iterate over all children paths: so, from `a.b`,
118+
you might iterate to find the path `a.b.c` (here you are iterating
119+
just over the paths that are **actually referenced** in the source,
120+
not all **possible** paths that could have been referenced). These
121+
references are used for example in the [`has_any_child_of`] function,
122+
which checks whether the dataflow results contain a value for the
123+
given move-path (e.g., `a.b`) or any child of that move-path (e.g.,
124+
`a.b.c`).
125+
126+
[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html
127+
[`has_any_child_of`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/at_location/struct.FlowAtLocation.html#method.has_any_child_of
128+

src/mir/regionck.md renamed to src/borrow_check/region_inference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
# MIR-based region checking (NLL)
1+
# Region inference (NLL)
22

33
The MIR-based region checking code is located in
44
[the `rustc_mir::borrow_check::nll` module][nll]. (NLL, of course,
55
stands for "non-lexical lifetimes", a term that will hopefully be
66
deprecated once they become the standard kind of lifetime.)
77

8-
[nll]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check/nll
8+
[nll]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/index.html
99

1010
The MIR-based region analysis consists of two major functions:
1111

src/borrow_check/type_check.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# The MIR type-check
2+
3+
A key component of the borrow check is the
4+
[MIR type-check](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/index.html).
5+
This check walks the MIR and does a complete "type check" -- the same
6+
kind you might find in any other language. In the process of doing
7+
this type-check, we also uncover the region constraints that apply to
8+
the program.
9+
10+
TODO -- elaborate further? Maybe? :)

src/mir/borrowck.md

Lines changed: 0 additions & 59 deletions
This file was deleted.

0 commit comments

Comments
 (0)