Skip to content

Lazy or eager ptr/int casts? #786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RalfJung opened this issue Jun 22, 2019 · 8 comments
Closed

Lazy or eager ptr/int casts? #786

RalfJung opened this issue Jun 22, 2019 · 8 comments
Labels
A-intptrcast Area: affects int2ptr and ptr2int casts C-proposal Category: a proposal for something we might want to do, or maybe not; details still being worked out

Comments

@RalfJung
Copy link
Member

RalfJung commented Jun 22, 2019

With intptrcast coming up, we have to make a decision whether we want the ptr/int casts to happen "lazily" or "eagerly". Ultimately this will be a question for UCG/lang-team to decide as part of the Rust/MIR semantics, and there are some hard open questions here, some of which are discussed in this paper. But for now we have to pick something.

ptr-to-int

  • "Eager" ptr-to-int cast means that we cast from ptr to int when executing the cast MIR statement (corresponding to the as or a coercion in the surface language).

  • "Lazy" ptr-to-int cast is basically what we do now, where the cast statement does nothing and we only perform the actual conversion when the int is subject to an operation where we need the raw bits.

In first approximation, with eager casts we have an invariant that a varable of integer type carries an integer value; with lazy casts we don't.

Eager casts are somewhat easier to understand, less confusing. Extra invariants are nice. However, that leaves open the question about code that "circumvents" the cast, such as transmuting a pointer to an integer:

let x = &mut 42 as *mut _;
let y: usize = mem::transmute(x); // is this legal?
let z = y + 1; // if yes, is this legal?
let z = z*2; // or this?

If we want to allow all of these operations, the aforementioned invariant is not actually holding, and we still have to remember to force_bits everywhere. Not even allowing the transmute would basically mean enforcing the aforementioned invariant: when validating integers, we don't allow pointers.

I am torn between allowing as much code as possible that people reasonably expect to work, simplifying the code by minimizing the amount of places where we force_bits/force_ptr, and knowing that the only answer that is actually formally good enough to justify LLVM's optimizations (excluding all pointer values at integer types) is likely going to upset people.^^

The only thing I am fairly sure I want is that a ptr-to-int-cast in the surface language actually does ptr_to_int in Miri. I know I argued against that in the past, but came to realize it is confusing, and also doing this cast helps a lot with testing.

int-to-ptr

For the other direction, we cannot eagerly do int-to-ptr conversion when an integer gets turned into a raw pointer as that is a safe operation. And similarly the user can transmute integers to pointers, so even if we cast eagerly when a reference gets created, we still have to handle integer values in the memory access operations.

So, we phase a similar situation as in the ptr-to-int case: if we allow maximal amounts of code, we have to handle integer values everywhere, we cannot have any meaningful extra invariant. And still we should probably make sure that when a reference gets created, it gets turned into a pointer value. Or maybe retagging can just take care of that.

@oli-obk (and anyone else reading) any opinions?

@RalfJung RalfJung added C-proposal Category: a proposal for something we might want to do, or maybe not; details still being worked out A-interpreter Area: affects the core interpreter labels Jun 22, 2019
@oli-obk
Copy link
Contributor

oli-obk commented Jun 24, 2019

For the other direction, we cannot eagerly do int-to-ptr conversion when an integer gets turned into a raw pointer as that is a safe operation.

casting references to raw pointers and then to integers is also a safe operation. Why is the argument different for ptr-to-int? Is it because int-to-ptr is fallible in contrast to ptr-to-int?

knowing that the only answer that is actually formally good enough to justify LLVM's optimizations (excluding all pointer values at integer types) is likely going to upset people.^^

that sentence is a bit hard to parse. Are you saying that LLVM misoptimizes pointer values in variables with int type? And thus we should keep enforcing that invariant in miri, making the ptr to int transmute UB

@RalfJung
Copy link
Member Author

casting references to raw pointers and then to integers is also a safe operation. Why is the argument different for ptr-to-int? Is it because int-to-ptr is fallible in contrast to ptr-to-int?

Exactly.

that sentence is a bit hard to parse. Are you saying that LLVM misoptimizes pointer values in variables with int type? And thus we should keep enforcing that invariant in miri, making the ptr to int transmute UB

I am saying that there is no known formal model that justifies what LLVM is doing without making ptr-to-int transmutes UB. Constructing miscompilations is possible in theory, making LLVM actually apply the right optimizations in the right order for this may or may not be possible.

But this is an academic problem in every sense of the word, and I think ruling out ptr-to-int transmutes in the surface language currently is not a good idea, as much as I'd like to do it.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 25, 2019

I am saying that there is no known formal model that justifies what LLVM is doing without making ptr-to-int transmutes UB

So... is there any reason we couldn't hide making-the-transmutes-not-UB behind a miri command line flag and defaulting to eager conversion?

@RalfJung
Copy link
Member Author

We could. I am just afraid of an exploding test matrix.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 25, 2019

well, compiletest supports running tests in multiple passes with differing flags and error markings. Unless you mean the runtime of the tests

@RalfJung
Copy link
Member Author

I think for now that's not worth it -- the extra value of this stricter mode does not justify the effort.

What would be nice is to have a mode that can still detect alignment failures -- some weaker form of intptrcast that lets us remove most of the hacks we carry, but ignores "accidental" alignment when checking memory accesses. I think we might even be implementing this accidentally currently, that's basically this FIXME. ;)

@RalfJung
Copy link
Member Author

RalfJung commented Jun 28, 2019

For detecting alignment failures, after consulting a few people the plan for now is to allow code per default to "guess" the right alignment (this reduces false positives), but to emit a warning and offer an option to turn the warning into an error (or silence it).

@RalfJung RalfJung added A-intptrcast Area: affects int2ptr and ptr2int casts and removed A-interpreter Area: affects the core interpreter labels Jun 28, 2019
@RalfJung
Copy link
Member Author

Looks like we came to an agreement on how to progress; the implementation is already tracked at #224.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-intptrcast Area: affects int2ptr and ptr2int casts C-proposal Category: a proposal for something we might want to do, or maybe not; details still being worked out
Projects
None yet
Development

No branches or pull requests

2 participants