Skip to content

Commit a6f45ee

Browse files
committed
WIP PROOF-OF-CONCEPT: experiment with very strict pointer provenance
This patch series examines the question: how bad would it be if we adopted an extremely strict pointer provenance model that completely banished all int<->ptr casts. The key insight to making this approach even *vaguely* pallatable is the ptr.with_addr(addr) -> ptr function, which takes a pointer and an address and creates a new pointer with that address and the provenance of the input pointer. In this way the "chain of custody" is completely and dynamically restored, making the model suitable even for dynamic checkers like CHERI and Miri. This is not a formal model, but lots of the docs discussing the model have been updated to try to the *concept* of this design in the hopes that it can be iterated on. Many new methods have been added to ptr to attempt to fill in semantic gaps that this introduces, or to just get the ball rolling on "hey this is a problem that needs to be solved, here's a bad solution as a starting point".
1 parent eded76b commit a6f45ee

File tree

3 files changed

+461
-81
lines changed

3 files changed

+461
-81
lines changed

library/core/src/ptr/const_ptr.rs

+139-31
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
// FIXME(strict_provenance_magic): this module still uses lots of casts to polyfill things.
2+
#![cfg_attr(not(bootstrap), allow(fuzzy_provenance_casts))]
3+
14
use super::*;
25
use crate::cmp::Ordering::{self, Equal, Greater, Less};
36
use crate::intrinsics;
@@ -60,44 +63,37 @@ impl<T: ?Sized> *const T {
6063

6164
/// Casts a pointer to its raw bits.
6265
///
63-
/// This is equivalent to `as usize`, but is more specific to enhance readability.
64-
/// The inverse method is [`from_bits`](#method.from_bits).
65-
///
66-
/// In particular, `*p as usize` and `p as usize` will both compile for
67-
/// pointers to numeric types but do very different things, so using this
68-
/// helps emphasize that reading the bits was intentional.
69-
///
70-
/// # Examples
66+
/// In general, pointers cannot be understood as "just an integer"
67+
/// and cannot be created from one without additional context.
7168
///
72-
/// ```
73-
/// #![feature(ptr_to_from_bits)]
74-
/// let array = [13, 42];
75-
/// let p0: *const i32 = &array[0];
76-
/// assert_eq!(<*const _>::from_bits(p0.to_bits()), p0);
77-
/// let p1: *const i32 = &array[1];
78-
/// assert_eq!(p1.to_bits() - p0.to_bits(), 4);
79-
/// ```
69+
/// If you would like to treat a pointer like an integer anyway,
70+
/// see [`addr`][] and [`with_addr`][] for the responsible way to do that.
8071
#[unstable(feature = "ptr_to_from_bits", issue = "91126")]
81-
pub fn to_bits(self) -> usize
72+
pub fn to_bits(self) -> [u8; core::mem::size_of::<*const ()>()]
8273
where
8374
T: Sized,
8475
{
85-
self as usize
76+
unsafe { core::mem::transmute(self) }
8677
}
8778

8879
/// Creates a pointer from its raw bits.
8980
///
9081
/// This is equivalent to `as *const T`, but is more specific to enhance readability.
91-
/// The inverse method is [`to_bits`](#method.to_bits).
82+
/// The inverse method is [`to_bits`](#method.to_bits-1).
9283
///
9384
/// # Examples
9485
///
9586
/// ```
9687
/// #![feature(ptr_to_from_bits)]
9788
/// use std::ptr::NonNull;
98-
/// let dangling: *const u8 = NonNull::dangling().as_ptr();
99-
/// assert_eq!(<*const u8>::from_bits(1), dangling);
89+
/// let dangling: *mut u8 = NonNull::dangling().as_ptr();
90+
/// assert_eq!(<*mut u8>::from_bits(1), dangling);
10091
/// ```
92+
#[rustc_deprecated(
93+
since = "1.61.0",
94+
reason = "This design is incompatible with Pointer Provenance",
95+
suggestion = "from_addr"
96+
)]
10197
#[unstable(feature = "ptr_to_from_bits", issue = "91126")]
10298
pub fn from_bits(bits: usize) -> Self
10399
where
@@ -106,6 +102,87 @@ impl<T: ?Sized> *const T {
106102
bits as Self
107103
}
108104

105+
/// Gets the "address" portion of the pointer.
106+
///
107+
/// On most platforms this is a no-op, as the pointer is just an address,
108+
/// and is equivalent to the deprecated `ptr as usize` cast.
109+
///
110+
/// On more complicated platforms like CHERI and segmented architectures,
111+
/// this may remove some important metadata. See [`with_addr`][] for
112+
/// details on this distinction and why it's important.
113+
#[unstable(feature = "strict_provenance", issue = "99999999")]
114+
pub fn addr(self) -> usize
115+
where
116+
T: Sized,
117+
{
118+
// FIXME(strict_provenance_magic): I am magic and should be a compiler intrinsic.
119+
self as usize
120+
}
121+
122+
/// Creates a new pointer with the given address.
123+
///
124+
/// See also: [`ptr::fake_alloc`][] and [`ptr::zst_exists`][].
125+
///
126+
/// This replaces the deprecated `usize as ptr` cast, which had
127+
/// fundamentally broken semantics because it couldn't restore
128+
/// *segment* and *provenance*.
129+
///
130+
/// A pointer semantically has 3 pieces of information associated with it:
131+
///
132+
/// * Segment: The address-space it is part of.
133+
/// * Provenance: An allocation (slice) that it is allowed to access.
134+
/// * Address: The actual address it points at.
135+
///
136+
/// The compiler and hardware need to properly understand all 3 of these
137+
/// values at all times to properly execute your code.
138+
///
139+
/// Segment and Provenance are implicitly defined by *how* a pointer is
140+
/// constructed and generally propagates verbatim to all derived pointers.
141+
/// It is therefore *impossible* to convert an address into a pointer
142+
/// on its own, because there is no way to know what its segment and
143+
/// provenance should be.
144+
///
145+
/// By introducing a "representative" pointer into the process we can
146+
/// properly construct a new pointer with *its* segment and provenance,
147+
/// just as any other derived pointer would. This *should* be equivalent
148+
/// to `wrapping_offset`ting the given pointer to the new address. See the
149+
/// docs for `wrapping_offset` for the restrictions this applies.
150+
///
151+
/// # Example
152+
///
153+
/// Here is an example of how to properly use this API to mess around
154+
/// with tagged pointers. Here we have a tag in the lowest bit:
155+
///
156+
/// ```ignore
157+
/// let my_tagged_ptr: *const T = ...;
158+
///
159+
/// // Get the address and do whatever bit tricks we like
160+
/// let addr = my_tagged_ptr.addr();
161+
/// let has_tag = (addr & 0x1) != 0;
162+
/// let real_addr = addr & !0x1;
163+
///
164+
/// // Reconstitute a pointer with the new address and use it
165+
/// let my_untagged_ptr = my_tagged_ptr.with_addr(real_addr);
166+
/// let val = *my_untagged_ptr;
167+
/// ```
168+
#[unstable(feature = "strict_provenance", issue = "99999999")]
169+
pub fn with_addr(self, addr: usize) -> Self
170+
where
171+
T: Sized,
172+
{
173+
// FIXME(strict_provenance_magic): I am magic and should be a compiler intrinsic.
174+
//
175+
// In the mean-time, this operation is defined to be "as if" it was
176+
// a wrapping_offset, so we can emulate it as such. This should properly
177+
// restore pointer provenance even under today's compiler.
178+
let self_addr = self.addr() as isize;
179+
let dest_addr = addr as isize;
180+
let offset = dest_addr.wrapping_sub(self_addr);
181+
182+
// This is the canonical desugarring of this operation
183+
self.cast::<u8>().wrapping_offset(offset).cast::<T>()
184+
}
185+
109186
/// Decompose a (possibly wide) pointer into its address and metadata components.
110187
///
111188
/// The pointer can be later reconstructed with [`from_raw_parts`].
@@ -305,10 +382,10 @@ impl<T: ?Sized> *const T {
305382
/// This operation itself is always safe, but using the resulting pointer is not.
306383
///
307384
/// The resulting pointer "remembers" the [allocated object] that `self` points to; it must not
308-
/// be used to read or write other allocated objects.
385+
/// be used to read or write other allocated objects. This is tracked by provenance.
309386
///
310-
/// In other words, `let z = x.wrapping_offset((y as isize) - (x as isize))` does *not* make `z`
311-
/// the same as `y` even if we assume `T` has size `1` and there is no overflow: `z` is still
387+
/// In other words, `let z = x.wrapping_offset((y.addr() as isize) - (x.addr() as isize))`
388+
/// does *not* make `z` the same as `y` even if we assume `T` has size `1` and there is no overflow: `z` is still
312389
/// attached to the object `x` is attached to, and dereferencing it is Undefined Behavior unless
313390
/// `x` and `y` point into the same allocated object.
314391
///
@@ -320,8 +397,39 @@ impl<T: ?Sized> *const T {
320397
///
321398
/// The delayed check only considers the value of the pointer that was dereferenced, not the
322399
/// intermediate values used during the computation of the final result. For example,
323-
/// `x.wrapping_offset(o).wrapping_offset(o.wrapping_neg())` is always the same as `x`. In other
324-
/// words, leaving the allocated object and then re-entering it later is permitted.
400+
/// `x.wrapping_offset(o).wrapping_offset(o.wrapping_neg())` is always the same as `x`...
401+
///
402+
/// Usually.
403+
///
404+
/// More work needs to be done to define the rules here, but on CHERI it is not *actually*
405+
/// a no-op to wrapping_offset a pointer to some random address and back again. For practical
406+
/// applications that actually need this, it *will* generally work, but if your offset is
407+
/// "too out of bounds" the system will mark your pointer as invalid, and subsequent reads
408+
/// will fault *as if* the pointer had been corrupted by a non-pointer instruction.
409+
///
410+
/// CHERI has a roughly 64-bit address space but its 128-bit pointers contain
411+
/// 3 ostensibly-address-space-sized values:
412+
///
413+
/// * 2 values for the "slice" that the pointer can access.
414+
/// * 1 value for the actuall address it points to.
415+
///
416+
/// To accomplish this, CHERI compresses the values and even requires large allocations
417+
/// to have higher alignment to free up extra bits. This compression scheme can support
418+
/// the pointer being offset outside of the slice, but only to an extent. A *generous*
419+
/// extent, but a limited one nonetheless. To quote CHERI's documenation:
420+
///
421+
/// > With 27 bits of the capability used for bounds, CHERI-MIPS and 64-bit
422+
/// > CHERI-RISC-V provide the following guarantees:
423+
/// >
424+
/// > * A pointer is able to travel at least 1⁄4 the size of the object, or 2 KiB,
425+
/// > whichever is greater, above its upper bound.
426+
/// > * It is able to travel at least 1⁄8 the size of the object, or 1 KiB,
427+
/// > whichever is greater, below its lower bound.
428+
///
429+
/// Needless to say, any scheme that relies on reusing the least significant bits
430+
/// of a pointer based on alignment is going to be fine. Any scheme which tries
431+
/// to set *high* bits isn't going to work, but that was *already* extremely
432+
/// platform-specific and not at all portable.
325433
///
326434
/// [`offset`]: #method.offset
327435
/// [allocated object]: crate::ptr#allocated-object
@@ -427,10 +535,10 @@ impl<T: ?Sized> *const T {
427535
/// ```rust,no_run
428536
/// let ptr1 = Box::into_raw(Box::new(0u8)) as *const u8;
429537
/// let ptr2 = Box::into_raw(Box::new(1u8)) as *const u8;
430-
/// let diff = (ptr2 as isize).wrapping_sub(ptr1 as isize);
538+
/// let diff = (ptr2.addr() as isize).wrapping_sub(ptr1.addr() as isize);
431539
/// // Make ptr2_other an "alias" of ptr2, but derived from ptr1.
432540
/// let ptr2_other = (ptr1 as *const u8).wrapping_offset(diff);
433-
/// assert_eq!(ptr2 as usize, ptr2_other as usize);
541+
/// assert_eq!(ptr2.addr(), ptr2_other.addr());
434542
/// // Since ptr2_other and ptr2 are derived from pointers to different objects,
435543
/// // computing their offset is undefined behavior, even though
436544
/// // they point to the same address!
@@ -653,7 +761,7 @@ impl<T: ?Sized> *const T {
653761
/// The resulting pointer "remembers" the [allocated object] that `self` points to; it must not
654762
/// be used to read or write other allocated objects.
655763
///
656-
/// In other words, `let z = x.wrapping_add((y as usize) - (x as usize))` does *not* make `z`
764+
/// In other words, `let z = x.wrapping_add((y.addr()) - (x.addr()))` does *not* make `z`
657765
/// the same as `y` even if we assume `T` has size `1` and there is no overflow: `z` is still
658766
/// attached to the object `x` is attached to, and dereferencing it is Undefined Behavior unless
659767
/// `x` and `y` point into the same allocated object.
@@ -715,7 +823,7 @@ impl<T: ?Sized> *const T {
715823
/// The resulting pointer "remembers" the [allocated object] that `self` points to; it must not
716824
/// be used to read or write other allocated objects.
717825
///
718-
/// In other words, `let z = x.wrapping_sub((x as usize) - (y as usize))` does *not* make `z`
826+
/// In other words, `let z = x.wrapping_sub((x.addr()) - (y.addr()))` does *not* make `z`
719827
/// the same as `y` even if we assume `T` has size `1` and there is no overflow: `z` is still
720828
/// attached to the object `x` is attached to, and dereferencing it is Undefined Behavior unless
721829
/// `x` and `y` point into the same allocated object.
@@ -1003,7 +1111,7 @@ impl<T> *const [T] {
10031111
/// use std::ptr;
10041112
///
10051113
/// let slice: *const [i8] = ptr::slice_from_raw_parts(ptr::null(), 3);
1006-
/// assert_eq!(slice.as_ptr(), 0 as *const i8);
1114+
/// assert_eq!(slice.as_ptr(), ptr::null());
10071115
/// ```
10081116
#[inline]
10091117
#[unstable(feature = "slice_ptr_get", issue = "74265")]

0 commit comments

Comments
 (0)