Lazy or eager ptr/int casts? #786

RalfJung · 2019-06-22T12:28:32Z

With intptrcast coming up, we have to make a decision whether we want the ptr/int casts to happen "lazily" or "eagerly". Ultimately this will be a question for UCG/lang-team to decide as part of the Rust/MIR semantics, and there are some hard open questions here, some of which are discussed in this paper. But for now we have to pick something.

ptr-to-int

"Eager" ptr-to-int cast means that we cast from ptr to int when executing the cast MIR statement (corresponding to the as or a coercion in the surface language).
"Lazy" ptr-to-int cast is basically what we do now, where the cast statement does nothing and we only perform the actual conversion when the int is subject to an operation where we need the raw bits.

In first approximation, with eager casts we have an invariant that a varable of integer type carries an integer value; with lazy casts we don't.

Eager casts are somewhat easier to understand, less confusing. Extra invariants are nice. However, that leaves open the question about code that "circumvents" the cast, such as transmuting a pointer to an integer:

let x = &mut 42 as *mut _;
let y: usize = mem::transmute(x); // is this legal?
let z = y + 1; // if yes, is this legal?
let z = z*2; // or this?

If we want to allow all of these operations, the aforementioned invariant is not actually holding, and we still have to remember to force_bits everywhere. Not even allowing the transmute would basically mean enforcing the aforementioned invariant: when validating integers, we don't allow pointers.

I am torn between allowing as much code as possible that people reasonably expect to work, simplifying the code by minimizing the amount of places where we force_bits/force_ptr, and knowing that the only answer that is actually formally good enough to justify LLVM's optimizations (excluding all pointer values at integer types) is likely going to upset people.^^

The only thing I am fairly sure I want is that a ptr-to-int-cast in the surface language actually does ptr_to_int in Miri. I know I argued against that in the past, but came to realize it is confusing, and also doing this cast helps a lot with testing.

int-to-ptr

For the other direction, we cannot eagerly do int-to-ptr conversion when an integer gets turned into a raw pointer as that is a safe operation. And similarly the user can transmute integers to pointers, so even if we cast eagerly when a reference gets created, we still have to handle integer values in the memory access operations.

So, we phase a similar situation as in the ptr-to-int case: if we allow maximal amounts of code, we have to handle integer values everywhere, we cannot have any meaningful extra invariant. And still we should probably make sure that when a reference gets created, it gets turned into a pointer value. Or maybe retagging can just take care of that.

@oli-obk (and anyone else reading) any opinions?

The text was updated successfully, but these errors were encountered:

oli-obk · 2019-06-24T07:22:47Z

For the other direction, we cannot eagerly do int-to-ptr conversion when an integer gets turned into a raw pointer as that is a safe operation.

casting references to raw pointers and then to integers is also a safe operation. Why is the argument different for ptr-to-int? Is it because int-to-ptr is fallible in contrast to ptr-to-int?

knowing that the only answer that is actually formally good enough to justify LLVM's optimizations (excluding all pointer values at integer types) is likely going to upset people.^^

that sentence is a bit hard to parse. Are you saying that LLVM misoptimizes pointer values in variables with int type? And thus we should keep enforcing that invariant in miri, making the ptr to int transmute UB

RalfJung · 2019-06-24T19:38:07Z

casting references to raw pointers and then to integers is also a safe operation. Why is the argument different for ptr-to-int? Is it because int-to-ptr is fallible in contrast to ptr-to-int?

Exactly.

that sentence is a bit hard to parse. Are you saying that LLVM misoptimizes pointer values in variables with int type? And thus we should keep enforcing that invariant in miri, making the ptr to int transmute UB

I am saying that there is no known formal model that justifies what LLVM is doing without making ptr-to-int transmutes UB. Constructing miscompilations is possible in theory, making LLVM actually apply the right optimizations in the right order for this may or may not be possible.

But this is an academic problem in every sense of the word, and I think ruling out ptr-to-int transmutes in the surface language currently is not a good idea, as much as I'd like to do it.

oli-obk · 2019-06-25T08:15:40Z

I am saying that there is no known formal model that justifies what LLVM is doing without making ptr-to-int transmutes UB

So... is there any reason we couldn't hide making-the-transmutes-not-UB behind a miri command line flag and defaulting to eager conversion?

RalfJung · 2019-06-25T08:28:09Z

We could. I am just afraid of an exploding test matrix.

oli-obk · 2019-06-25T08:46:20Z

well, compiletest supports running tests in multiple passes with differing flags and error markings. Unless you mean the runtime of the tests

RalfJung · 2019-06-27T22:19:13Z

I think for now that's not worth it -- the extra value of this stricter mode does not justify the effort.

What would be nice is to have a mode that can still detect alignment failures -- some weaker form of intptrcast that lets us remove most of the hacks we carry, but ignores "accidental" alignment when checking memory accesses. I think we might even be implementing this accidentally currently, that's basically this FIXME. ;)

RalfJung · 2019-06-28T17:10:46Z

For detecting alignment failures, after consulting a few people the plan for now is to allow code per default to "guess" the right alignment (this reduces false positives), but to emit a warning and offer an option to turn the warning into an error (or silence it).

RalfJung · 2019-06-29T10:49:05Z

Looks like we came to an agreement on how to progress; the implementation is already tracked at #224.

RalfJung added C-proposal Category: a proposal for something we might want to do, or maybe not; details still being worked out A-interpreter Area: affects the core interpreter labels Jun 22, 2019

RalfJung added A-intptrcast Area: affects int2ptr and ptr2int casts and removed A-interpreter Area: affects the core interpreter labels Jun 28, 2019

RalfJung closed this as completed Jun 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy or eager ptr/int casts? #786

Lazy or eager ptr/int casts? #786

RalfJung commented Jun 22, 2019 •

edited

Loading

oli-obk commented Jun 24, 2019

RalfJung commented Jun 24, 2019

oli-obk commented Jun 25, 2019

RalfJung commented Jun 25, 2019

oli-obk commented Jun 25, 2019

RalfJung commented Jun 27, 2019

RalfJung commented Jun 28, 2019 •

edited

Loading

RalfJung commented Jun 29, 2019

Lazy or eager ptr/int casts? #786

Lazy or eager ptr/int casts? #786

Comments

RalfJung commented Jun 22, 2019 • edited Loading

ptr-to-int

int-to-ptr

oli-obk commented Jun 24, 2019

RalfJung commented Jun 24, 2019

oli-obk commented Jun 25, 2019

RalfJung commented Jun 25, 2019

oli-obk commented Jun 25, 2019

RalfJung commented Jun 27, 2019

RalfJung commented Jun 28, 2019 • edited Loading

RalfJung commented Jun 29, 2019

RalfJung commented Jun 22, 2019 •

edited

Loading

RalfJung commented Jun 28, 2019 •

edited

Loading