From 626b582e633bf2092deca6b344dbd3f8cfd2756e Mon Sep 17 00:00:00 2001 From: Tatsuyuki Ishi Date: Thu, 12 Apr 2018 18:04:02 +0900 Subject: [PATCH 1/8] Zero Page Optimization --- text/0000-zero-page-optimization.md | 123 ++++++++++++++++++++++++++++ 1 file changed, 123 insertions(+) create mode 100644 text/0000-zero-page-optimization.md diff --git a/text/0000-zero-page-optimization.md b/text/0000-zero-page-optimization.md new file mode 100644 index 00000000000..4707b2e8a3a --- /dev/null +++ b/text/0000-zero-page-optimization.md @@ -0,0 +1,123 @@ +- Feature Name: zero_page_optimization +- Start Date: 2018-04-09 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Extend the null pointer optimization to any value inside the zero page (which a +reference cannot have the value). + +# Motivation +[motivation]: #motivation + +Modern operating systems normally [traps null pointer access](https://en.wikipedia.org/wiki/Zero_page). +This means valid pointers will never take values inside the zero page, and we +can exploit this for ~12 bits of storage for secondary variants. + +Inside Rust std, we rely on the assumption that zero page exists: + +https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238 + +However, this is not something that is documented in the nomicon, neither it's +always true. For instance, microcontrollers without MMU doesn't implement such +guards at all, and `0` is a valid address where the entrypoint lies. See +[Cortex-M4](https://developer.arm.com/docs/ddi0439/latest/programmers-model/system-address-map)'s +design as one of such example. + +To make things worse, such usage is also seen outside std, on crates that compile +on stable Rust: + +https://github.com/rust-lang-nursery/futures-rs/blob/856fde847d4062f5d2af5d85d6640028297a10f1/futures-util/src/lock.rs#L157-L169 + +Such crates should not assume anything regarding Rust ABI internals, but in the +case of this `BiLock`, we rely on compressing it into a usize so we can perform +atomic operations without a mutex. Of course, this code has risk to break if +used on microcontrollers. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +This change should be transparent for most users; the following description is +targeted at people dealing with FFI or unsafe. + +The recently stabilized `NonNull` type will have more strict requirements: +the pointer must be not in the null page, and it must be valid to dereference. + +`&T`, `&mut T`, `NonNull` will have the same ranging semantics: +they will not take any value inside the zero page. We will optimize the layout +of an enumeration in a way similar to before, except that we will allow +discriminants of up to the zero page size (typically 4095). + +Also, attempts to compress discriminants will be performed: which means, an +`Option>` will be flattened internally, so its layout will be similar +to: + +```rust +enum ... { + NoneInner, // discriminant 0 + NoneOuter, // discriminant 1 + Some(&T) // remainder +} +``` + +Note that here, we assign discriminants from inner to outer. This makes the +representation match when a reference is taken. + +The exact behavior of this optimization should be documented upon implementation, +for unsafe coding usage. + +To take advantage of zero page optimization, use `transmute` from and to usize. +This will cause compilation to fail if such optimization is not permitted on +the target. + +An `zero_page_size` `#[cfg]` attribute will also be exposed, to code a fallback +instead of failing in cases like above. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +We will add a target-specific constant to determine the availability and size +of the zero page. The zero page range starts from 0, and must be at least one +byte so that old code relying on null pointer optimization will not break. + +For the defined range, the compiler must ensure that no pointer of which value +is inside the range could be created safely. On microcontrollers, a dumb solution +would be creating a nop sled at the entrypoint. + +This optimization only applies to pointer-like values (which can be dereferenced), +and `std::num::NonZero` keeps its current behavior. + +The pointer internals will be also adopted to use this scheme: `Unique` should +be refactored to use an enum internally. + +# Drawbacks +[drawbacks]: #drawbacks + +- This can create discrimination between platforms, although whether it's preferred +over undefined behavior is debatable. +- Compressing discriminant is not very straightforward. + +# Rationale and alternatives +[alternatives]: #alternatives + +## On the "null range" + +- If we allow "none" to be set as the zero page range, it will make `Option<&T>`'s +layout Rust specific, which can't be used in FFI anymore. On microcontrollers +FFI should still be possible, so such breaking change isn't acceptable. +- We can also allow a very big value to use as "invalid page" range. However, this +may be incompatible with our current internals where `0` is considered `null`. + +# Prior art +[prior-art]: #prior-art + +Not applicable: Null pointer optimization is Rust specific, and this enhancement +is Rust specific too. + +# Unresolved questions +[unresolved]: #unresolved-questions + +- Can we suggest a better alternative than `transmute`? `transmute` is too +error prone despite we're trying to make the code more "safe". From 11a514648b456ec0091aacd15500f180b76ea0d4 Mon Sep 17 00:00:00 2001 From: Tatsuyuki Ishi Date: Fri, 13 Apr 2018 16:26:06 +0900 Subject: [PATCH 2/8] Reword motivation --- text/0000-zero-page-optimization.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/text/0000-zero-page-optimization.md b/text/0000-zero-page-optimization.md index 4707b2e8a3a..1cf2b467b60 100644 --- a/text/0000-zero-page-optimization.md +++ b/text/0000-zero-page-optimization.md @@ -16,25 +16,27 @@ Modern operating systems normally [traps null pointer access](https://en.wikiped This means valid pointers will never take values inside the zero page, and we can exploit this for ~12 bits of storage for secondary variants. -Inside Rust std, we rely on the assumption that zero page exists: +[Inside Rust std](https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238), +we use a "dangling" pointer for ZST allocations; this involves a somewhat +verbose logic. -https://github.com/rust-lang/rust/blob/ca26ef321c44358404ef788d315c4557eb015fb2/src/liballoc/heap.rs#L238 +Outside std, we also see `futures-util` +[uses 1](https://github.com/rust-lang-nursery/futures-rs/blob/856fde847d4062f5d2af5d85d6640028297a10f1/futures-util/src/lock.rs#L157-L169) +as a special pointer value. However, this is not something that is documented in the nomicon, neither it's always true. For instance, microcontrollers without MMU doesn't implement such -guards at all, and `0` is a valid address where the entrypoint lies. See +guards at all, and `0` and `1` is a valid address where the entrypoint lies. See [Cortex-M4](https://developer.arm.com/docs/ddi0439/latest/programmers-model/system-address-map)'s design as one of such example. -To make things worse, such usage is also seen outside std, on crates that compile -on stable Rust: - -https://github.com/rust-lang-nursery/futures-rs/blob/856fde847d4062f5d2af5d85d6640028297a10f1/futures-util/src/lock.rs#L157-L169 - Such crates should not assume anything regarding Rust ABI internals, but in the case of this `BiLock`, we rely on compressing it into a usize so we can perform -atomic operations without a mutex. Of course, this code has risk to break if -used on microcontrollers. +atomic operations without a mutex. In practice, the entrypoint at `0` is +unlikely to be filled with Rust code but platform-specific bootstrap assembly. +Also, other factors like alignment also get involved so in practice we can't +collide the address. However, this RFC proposes a more logical and typed way +to code such things. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation From c82a2461bed1daa3d32090ba5a1c6b9308ed2f38 Mon Sep 17 00:00:00 2001 From: Tatsuyuki Ishi Date: Fri, 13 Apr 2018 16:57:01 +0900 Subject: [PATCH 3/8] Expose attribute for configuring size --- text/0000-zero-page-optimization.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/text/0000-zero-page-optimization.md b/text/0000-zero-page-optimization.md index 1cf2b467b60..797e157652d 100644 --- a/text/0000-zero-page-optimization.md +++ b/text/0000-zero-page-optimization.md @@ -74,13 +74,16 @@ To take advantage of zero page optimization, use `transmute` from and to usize. This will cause compilation to fail if such optimization is not permitted on the target. +An crate attribute `zero_page_size` will be exposed for configuring the exact +size of the zero page. This is mainly targeted at microcontroller runtimes. + An `zero_page_size` `#[cfg]` attribute will also be exposed, to code a fallback instead of failing in cases like above. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -We will add a target-specific constant to determine the availability and size +We will add a target-specific default to determine the availability and size of the zero page. The zero page range starts from 0, and must be at least one byte so that old code relying on null pointer optimization will not break. From 67ef48050f7cf3058af5de2d7573266775069587 Mon Sep 17 00:00:00 2001 From: Tatsuyuki Ishi Date: Fri, 13 Apr 2018 17:11:53 +0900 Subject: [PATCH 4/8] Update NonNull requirements --- text/0000-zero-page-optimization.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/text/0000-zero-page-optimization.md b/text/0000-zero-page-optimization.md index 797e157652d..89f2b8ef57e 100644 --- a/text/0000-zero-page-optimization.md +++ b/text/0000-zero-page-optimization.md @@ -45,7 +45,11 @@ This change should be transparent for most users; the following description is targeted at people dealing with FFI or unsafe. The recently stabilized `NonNull` type will have more strict requirements: -the pointer must be not in the null page, and it must be valid to dereference. +the pointer must be not in the null page. `NonNull::dangling` will be +deprecated in favor of this optimization. + +During the migration, we should migrate the impact with a crater run. If changing +the behavior directly is unacceptable, then we'll have to create a new type instead. `&T`, `&mut T`, `NonNull` will have the same ranging semantics: they will not take any value inside the zero page. We will optimize the layout From 3b9864ccd91c32c212ed2888a7d3f0b89a9194bd Mon Sep 17 00:00:00 2001 From: Tatsuyuki Ishi Date: Fri, 13 Apr 2018 17:13:52 +0900 Subject: [PATCH 5/8] Add unresolved questions --- text/0000-zero-page-optimization.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/0000-zero-page-optimization.md b/text/0000-zero-page-optimization.md index 89f2b8ef57e..64de78d7aa7 100644 --- a/text/0000-zero-page-optimization.md +++ b/text/0000-zero-page-optimization.md @@ -130,3 +130,7 @@ is Rust specific too. - Can we suggest a better alternative than `transmute`? `transmute` is too error prone despite we're trying to make the code more "safe". +- We can also store data in the lower bits of pointer, utilizing the alignemnt +requirement. Also, amd64 pointers are 48-bit technically, so we may also exploit +the space. These optimizations are less portable, and should be filed in another +RFC. \ No newline at end of file From a529635cf11e80081a05a44d99fba494311b41fb Mon Sep 17 00:00:00 2001 From: Tatsuyuki Ishi Date: Mon, 16 Apr 2018 17:28:20 +0900 Subject: [PATCH 6/8] Reword internal refactoring --- text/0000-zero-page-optimization.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-zero-page-optimization.md b/text/0000-zero-page-optimization.md index 64de78d7aa7..7a9a0056e1c 100644 --- a/text/0000-zero-page-optimization.md +++ b/text/0000-zero-page-optimization.md @@ -98,8 +98,8 @@ would be creating a nop sled at the entrypoint. This optimization only applies to pointer-like values (which can be dereferenced), and `std::num::NonZero` keeps its current behavior. -The pointer internals will be also adopted to use this scheme: `Unique` should -be refactored to use an enum internally. +We should refactor the allocator related code to prefer enumerations over +`NonNull::dangling`. # Drawbacks [drawbacks]: #drawbacks From 3167c541b063df11840652fbb88e704751d657b8 Mon Sep 17 00:00:00 2001 From: Tatsuyuki Ishi Date: Mon, 16 Apr 2018 18:38:06 +0900 Subject: [PATCH 7/8] Newtype revamp Introduce a new type that can benefit from more optimizations. --- text/0000-zero-page-optimization.md | 43 +++++++++++++++++------------ 1 file changed, 25 insertions(+), 18 deletions(-) diff --git a/text/0000-zero-page-optimization.md b/text/0000-zero-page-optimization.md index 7a9a0056e1c..28bcf71d99f 100644 --- a/text/0000-zero-page-optimization.md +++ b/text/0000-zero-page-optimization.md @@ -44,17 +44,19 @@ to code such things. This change should be transparent for most users; the following description is targeted at people dealing with FFI or unsafe. -The recently stabilized `NonNull` type will have more strict requirements: -the pointer must be not in the null page. `NonNull::dangling` will be -deprecated in favor of this optimization. +A new type, `Shared` is (re-)introduced: `Shared` wraps a `*mut T` and +must store a pointer to valid memory allocated for the correct type. This +allows the compiler to assume that the pointer is not inside the zero page, +plus it allows further optimization to be implemented like using the lower bits +of the pointer by exploiting the alignment requirement. -During the migration, we should migrate the impact with a crater run. If changing -the behavior directly is unacceptable, then we'll have to create a new type instead. +`&T`, `&mut T`, `Shared` will have the same ranging semantics, as described +above. Plus, the following optimizations will also be done: -`&T`, `&mut T`, `NonNull` will have the same ranging semantics: -they will not take any value inside the zero page. We will optimize the layout -of an enumeration in a way similar to before, except that we will allow -discriminants of up to the zero page size (typically 4095). +- These types will be ZST if `T` is ZST. An arbitrary constant is returned as +the inner raw pointer. `0` is a good candidate here because we don't actually +store it, we don't have to worry about it conflicting with the optimization. +- These types will be inhabitable if `T` is inhabitable. Also, attempts to compress discriminants will be performed: which means, an `Option>` will be flattened internally, so its layout will be similar @@ -74,6 +76,14 @@ representation match when a reference is taken. The exact behavior of this optimization should be documented upon implementation, for unsafe coding usage. +The discriminant compression is primarily intended for pointers, but for saving +memory, it should also apply to the following cases: + +- For enums that only contains one variant which can contain value. +- For structs that hold such enum as the first element. Here, the first element +is considered after reordering. This allows `Option>` to remain at the +size of 3 pointers, for example. + To take advantage of zero page optimization, use `transmute` from and to usize. This will cause compilation to fail if such optimization is not permitted on the target. @@ -95,11 +105,11 @@ For the defined range, the compiler must ensure that no pointer of which value is inside the range could be created safely. On microcontrollers, a dumb solution would be creating a nop sled at the entrypoint. -This optimization only applies to pointer-like values (which can be dereferenced), -and `std::num::NonZero` keeps its current behavior. - -We should refactor the allocator related code to prefer enumerations over -`NonNull::dangling`. +We should refactor the allocation related code to prefer enumerations over +`NonNull::dangling`. Taking `RawVec` code as an example, we would use +`Option>` to store the internal pointer. For ZST, we initialize +with an arbitrary value (as we don't store it); for zero-length vector, we make +use of the `None` variant to indicate that we didn't allocate. # Drawbacks [drawbacks]: #drawbacks @@ -130,7 +140,4 @@ is Rust specific too. - Can we suggest a better alternative than `transmute`? `transmute` is too error prone despite we're trying to make the code more "safe". -- We can also store data in the lower bits of pointer, utilizing the alignemnt -requirement. Also, amd64 pointers are 48-bit technically, so we may also exploit -the space. These optimizations are less portable, and should be filed in another -RFC. \ No newline at end of file +- `Shared` wasn't a good name; we may want a better name for the new type. \ No newline at end of file From 935d62e630916541fbe0f2ccb97ac37dbccfaffb Mon Sep 17 00:00:00 2001 From: Tatsuyuki Ishi Date: Sat, 21 Apr 2018 15:26:55 +0900 Subject: [PATCH 8/8] Remove ZST optimization --- text/0000-zero-page-optimization.md | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/text/0000-zero-page-optimization.md b/text/0000-zero-page-optimization.md index 28bcf71d99f..8c381a770fe 100644 --- a/text/0000-zero-page-optimization.md +++ b/text/0000-zero-page-optimization.md @@ -51,14 +51,7 @@ plus it allows further optimization to be implemented like using the lower bits of the pointer by exploiting the alignment requirement. `&T`, `&mut T`, `Shared` will have the same ranging semantics, as described -above. Plus, the following optimizations will also be done: - -- These types will be ZST if `T` is ZST. An arbitrary constant is returned as -the inner raw pointer. `0` is a good candidate here because we don't actually -store it, we don't have to worry about it conflicting with the optimization. -- These types will be inhabitable if `T` is inhabitable. - -Also, attempts to compress discriminants will be performed: which means, an +above. Also, attempts to compress discriminants will be performed: which means, an `Option>` will be flattened internally, so its layout will be similar to: @@ -107,9 +100,7 @@ would be creating a nop sled at the entrypoint. We should refactor the allocation related code to prefer enumerations over `NonNull::dangling`. Taking `RawVec` code as an example, we would use -`Option>` to store the internal pointer. For ZST, we initialize -with an arbitrary value (as we don't store it); for zero-length vector, we make -use of the `None` variant to indicate that we didn't allocate. +`Option>` to store the internal pointer. # Drawbacks [drawbacks]: #drawbacks @@ -140,4 +131,4 @@ is Rust specific too. - Can we suggest a better alternative than `transmute`? `transmute` is too error prone despite we're trying to make the code more "safe". -- `Shared` wasn't a good name; we may want a better name for the new type. \ No newline at end of file +- `Shared` wasn't a good name; we may want a better name for the new type.