|
| 1 | +- Feature Name: guaranteed_slice_repr |
| 2 | +- Start Date: 2025-02-18 |
| 3 | +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) |
| 4 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +This RFC guarantees the in-memory representation of slice and str references. |
| 10 | +Specifically, `&[T]` and `&mut [T]` are guaranteed to have the same layout as: |
| 11 | + |
| 12 | +```rust |
| 13 | +#[repr(C)] |
| 14 | +struct Slice<T> { |
| 15 | + data: *const T, |
| 16 | + len: usize, |
| 17 | +} |
| 18 | +``` |
| 19 | + |
| 20 | +The layout of `&str` is the same as that of `&[u8]`, and the layout of |
| 21 | +`&mut str` is the same as that of `&mut [u8]`. |
| 22 | + |
| 23 | +# Motivation |
| 24 | +[motivation]: #motivation |
| 25 | + |
| 26 | +This RFC allows non-Rust (e.g. C or C++) code to read from or write to existing |
| 27 | +slices and to declare slice fields or locals. |
| 28 | + |
| 29 | +For example, guaranteeing the representation of slice references allows |
| 30 | +non-Rust code to read from the `data` or `len` fields of `string` in the type |
| 31 | +below without intermediate FFI calls into Rust: |
| 32 | + |
| 33 | +```rust |
| 34 | +#[repr(C)] |
| 35 | +struct HasString { |
| 36 | + string: &'static str, |
| 37 | +} |
| 38 | +``` |
| 39 | + |
| 40 | +Note: prior to this RFC, the type above is not even properly `repr(C)` since the |
| 41 | +size and alignment of slices were not guaranteed. However, the Rust compiler |
| 42 | +accepts the `repr(C)` declaration above without warning. |
| 43 | + |
| 44 | +# Guide-level explanation |
| 45 | +[guide-level-explanation]: #guide-level-explanation |
| 46 | + |
| 47 | +Slice references are represented with a pointer and length pair. Their in-memory |
| 48 | +layout is the same as a `#[repr(C)]` struct like the following: |
| 49 | + |
| 50 | +```rust |
| 51 | +#[repr(C)] |
| 52 | +struct Slice<T> { |
| 53 | + data: *const T, |
| 54 | + len: usize, |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +The precise ABI of slice references is not guaranteed, so `&[T]` may not be |
| 59 | +passed by-value or returned by-value from an `extern "C" fn`. |
| 60 | + |
| 61 | +The validity requirements for the in-memory representation of slice references |
| 62 | +are the same as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html) for shared slice references, and |
| 63 | +[those documented on `std::slice::from_raw_parts_mut`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts_mut.html) |
| 64 | +for mutable slice references. |
| 65 | + |
| 66 | +Namely: |
| 67 | + |
| 68 | +* `data` must be non-null, valid for reads (for shared references) or writes |
| 69 | + (for mutable references) for `len * mem::size_of::<T>()` many bytes, |
| 70 | + and it must be properly aligned. This means in particular: |
| 71 | + |
| 72 | + * The entire memory range of this slice must be contained within a single allocated object! |
| 73 | + Slices can never span across multiple allocated objects. |
| 74 | + * `data` must be non-null and aligned even for zero-length slices or slices of ZSTs. One |
| 75 | + reason for this is that enum layout optimizations may rely on references |
| 76 | + (including slices of any length) being aligned and non-null to distinguish |
| 77 | + them from other data. You can obtain a pointer that is usable as `data` |
| 78 | + for zero-length slices using [`NonNull::dangling()`]. |
| 79 | + |
| 80 | +* `data` must point to `len` consecutive properly initialized values of type `T`. |
| 81 | + |
| 82 | +* The total size `len * mem::size_of::<T>()` of the slice must be no larger than `isize::MAX`, |
| 83 | + and adding that size to `data` must not "wrap around" the address space. |
| 84 | + See the safety documentation of [`pointer::offset`]. |
| 85 | + |
| 86 | +## `str` |
| 87 | + |
| 88 | +The layout of `&str` is the same as that of `&[u8]`, and the layout of |
| 89 | +`&mut str` is the same as that of `&mut [u8]`. More generally, `str` behaves like |
| 90 | +`#[repr(transparent)] struct str([u8]);`. Safe Rust functions may assume that |
| 91 | +`str` holds valid UTF8, but [it is not immediate undefined-behavior to store |
| 92 | +non-UTF8 data in `str`](https://doc.rust-lang.org/std/primitive.str.html#invariant). |
| 93 | + |
| 94 | +## Pointers |
| 95 | + |
| 96 | +Raw pointers to slices such as `*const [T]` or `*mut str` use the same layout |
| 97 | +as slice references, but do not necessarily point to anything. |
| 98 | + |
| 99 | +# Drawbacks |
| 100 | +[drawbacks]: #drawbacks |
| 101 | + |
| 102 | +## Zero-sized types |
| 103 | + |
| 104 | +One could imagine representing `&[T]` as only `len` for zero-sized `T`. |
| 105 | +This proposal would preclude that choice in favor of a standard representation |
| 106 | +for slices regardless of the underlying type. |
| 107 | + |
| 108 | +Alternatively, we could choose to guarantee that the data pointer is present if |
| 109 | +and only if `size_of::<T> != 0`. This has the possibility of breaking exising |
| 110 | +code which smuggles pointers through the `data` value in `from_raw_parts` / |
| 111 | +`into_raw_parts`. |
| 112 | + |
| 113 | +## Uninhabited types |
| 114 | + |
| 115 | +Similarly, we could be *extra* tricky and make `&[!]` or other `&[Uninhabited]` |
| 116 | +types into a ZST since the slice can only ever be length zero. |
| 117 | + |
| 118 | +If we want to maintain the pointer field, we could also make `&[!]` *just* a |
| 119 | +pointer since we know the length can only be zero. |
| 120 | + |
| 121 | +Either option may offer modest performance benefits for highly generic code |
| 122 | +which happens to create empty slices of uninhabited types, but this is unlikely |
| 123 | +to be worth the cost of maintaining a special case. |
| 124 | + |
| 125 | +## Compatibility with C++ `std::span` |
| 126 | + |
| 127 | +The largest drawback of this layout and set of validity requirements is that it |
| 128 | +may preclude `&[T]` from being representationally equivalent to C++'s |
| 129 | +`std::span<T, std::dynamic_extent>`. |
| 130 | + |
| 131 | +* `std::span` does not currently guarantee its layout. In practice, pointer + length |
| 132 | + is the common representation. This is even observable using `is_layout_compatible` |
| 133 | + [on MSVC](https://godbolt.org/z/Y8ardrshY), though not |
| 134 | + [on GCC](https://godbolt.org/z/s4v4xehnG) nor |
| 135 | + [on Clang](https://godbolt.org/z/qsd1K5oGq). Future changes to guarantee a |
| 136 | + different layout in the C++ standard (unlikely due to MSVC ABI stabilitiy |
| 137 | + requirements) could preclude matching the layout with `&[T]`. |
| 138 | + |
| 139 | +* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One |
| 140 | + possibile workaround for this would be to guarantee that `Option<&[T]>` uses |
| 141 | + `data: std::ptr::null(), len: 0` to represent the `None` case, making |
| 142 | + `std::span<T>` equivalent to `Option<&[T]>` for non-zero-sized types. |
| 143 | + |
| 144 | + Note that this is not currently the case. The compiler currenty represents |
| 145 | + `None::<&[u8]>` as `data: std::ptr::null(), len: uninit` (though this is |
| 146 | + not guaranteed). |
| 147 | + |
| 148 | +* Rust uses a dangling pointer in the representation of zero-length slices. |
| 149 | + It's unclear whether C++ guarantees that a dangling pointer will remain |
| 150 | + unchanged when passed through `std::span`. However, it does support |
| 151 | + dangling pointers during regular construction via the use of |
| 152 | + [`std::to_address`](https://en.cppreference.com/w/cpp/container/span/span) |
| 153 | + in the iterator constructors. |
| 154 | + |
| 155 | +Note that C++ also does not support zero-sized types, so there is no naive way |
| 156 | +to represent types like `std::span<SomeZeroSizedRustType>`. |
| 157 | + |
| 158 | +## Flexibility |
| 159 | + |
| 160 | +Additionally, guaranteeing layout of Rust-native types limits the compiler's and |
| 161 | +standard library's ability to change and take advantage of new optimization |
| 162 | +opportunities. |
| 163 | + |
| 164 | +# Rationale and alternatives |
| 165 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 166 | + |
| 167 | +* We could avoid committing to a particular representation for slices. |
| 168 | + |
| 169 | +* We could try to guarantee layout compatibility with a particular target's |
| 170 | + `std::span` representation, though without standardization this may be |
| 171 | + impossible. Multiple different C++ stdlib implementations may be used on |
| 172 | + the same platform and could potentially have different span representations. |
| 173 | + In practice, current span representations also use ptr+len pairs. |
| 174 | + |
| 175 | +* We could avoid storing a data pointer for zero-sized types. This would result |
| 176 | + in a more compact representation but would mean that the representation of |
| 177 | + `&[T]` is dependent on the type of `T`. Additionally, this would break |
| 178 | + existing code which depends on storing data in the pointer of ZST slices. |
| 179 | + |
| 180 | + This would break popular crates such as [bitvec](https://docs.rs/crate/bitvec/1.0.1/source/doc/ptr/BitSpan.md) |
| 181 | + (55 million downloads) and would result in strange behavior such as |
| 182 | + `std::ptr::slice_from_raw_parts(ptr, len).as_ptr()` returning a different |
| 183 | + pointer from the one that was passed in. |
| 184 | + |
| 185 | + Types like `*const ()` / `&()` are widely used to pass around pointers today. |
| 186 | + We cannot make them zero-sized, and it would be surprising to make a |
| 187 | + different choice for `&[()]`. |
| 188 | + |
| 189 | + |
| 190 | +# Prior art |
| 191 | +[prior-art]: #prior-art |
| 192 | + |
| 193 | +The layout in this RFC is already documented in |
| 194 | +[the Unsafe Code Guildelines Reference.](https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html) |
| 195 | + |
| 196 | +# Future possibilities |
| 197 | +[future-possibilities]: #future-possibilities |
| 198 | + |
| 199 | +* Consider defining a separate Rust type which is repr-equivalent to the platform's |
| 200 | + native `std::span<T, std::dynamic_extent>` to allow for easier |
| 201 | + interoperability with C++ APIs. Unfortunately, the C++ standard does not |
| 202 | + guarantee the layout of `std::span` (though the representation may be known |
| 203 | + and fixed on a particular implementation, e.g. libc++/libstdc++/MSVC). |
| 204 | + Zero-sized types would also not be supported with a naive implementation of |
| 205 | + such a type. |
0 commit comments