-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Proposal: Introduce safety-checked UB for reading invalid
bit patterns via @bitCast
, pointers, and untagged unions
#6784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
invalid
bit patterns via @bitCast
, pointers, and untagged unionsinvalid
bit patterns via @bitCast
, pointers, and untagged unions
What are the reasons for not "doing something radical" and rejecting all of this at compile time (including untagged unions with sparse fields unless they share the same legal bit patterns)? I suppose I am missing some obvious C interop reasons... |
@cajw1 I'm not personally opposed to these "radical" changes/limitations either. I wanted to clarify that I wouldn't blindly set them in stone up front though, since
|
@rohlem , I had not thought about heap allocation... Looking at
As soon as you cast a non-sparse to a sparse, you are really overpromising, so if such a cast were considered to |
Just to be clear, this proposal is (intentionally) not about I'd like not to sidetrack too much, but to quickly address your ideas @cajw1 :
I'd really appreciate more feedback from more knowledgeable developers than us, but if this is deemed the right direction, then a type distinction |
Introducing a second concept of Would Valid:
Possibly invalid:
Unfortunately, I do not see how you will get valid data from an allocator. Maybe you also pass in an initialization function to the allocator? This would take an OO constructor and turn it inside out. Sorry for the incoherent rambling. Clearly I have not had enough caffeine yet. It does seems like this concept of valid vs. invalid/unproven valid is possibly very powerful and probably different from |
@kyle-github I think a keyword (type attribute) is necessary as soon as it affects interfaces. To quote the example from #3328 (comment), output parameter pointers (f.e. Either way, I'd like for this proposal to remain about the idea that "we want a panic (safety-checked UB) when we cast to /read invalid values", and not conflate it with solutions for dealing with invalid values (which needs to be done before they are cast to /read). |
Note that if this is accepted, the implementations of |
Status-quo
The Zig type system allows for types that don't cleanly fit into bytes / registers, I'll call them
sparse types
for short. These haveinvalid
bit patterns, which do not map to a value of the specified type. For example:*T
cannot be0
, the bit pattern that?*T
uses to representnull
.enum {a, b, c}
needs more than 1 bit but less than 2 bits to be represented.The only way to produce
invalid
values ofsparse
types is viareinterpretation operations
, which include:@bitCast
,@ptrCast
,@intToPtr
As far as I could find, the documentation doesn't state how
invalid
values are handled in the general case. Instead, every checked, value-preserving cast is individually documented to trigger safety-checked undefined behaviour if the input value is out-of-bounds for the target type.This proposal
Formally specify that constructing an
invalid
value is always immediately illegal (safety-checked undefined) behaviour.I wrote a couple of example scenarios. These conditions are currently not checked (I can create individual issues if the general idea is accepted):
invalid
values.invalid
value of that type.@bitCast
-ing to apacked struct/union
with fields of sparse types can lead to those fields' values beinginvalid
.(Essentially every potentially-new read of a
packed struct/union
field of a sparse type needs to be checked.)@ptrCast
-ing the address of any variable of a sparse type to a modifiable pointer-to-different-type means that new pointer can be used to writeinvalid
values readable as the source type.(I don't think this is something a non-omniscient implementation can guard against, unless we did something radical like outlawing
@ptrCast
from sparse source types.Technically this would work to some extent. You could declare a
packed/extern union
to get the original behaviour, which also signals to the compiler thatinvalid
values might appear in that location.This safety falls apart if you
@ptrCast
a sparse type to a union containing it it wasn't declared in though.)As a special case, stage1 currently always loads all integer types by exactly their bit width (masking all other bits off), even when they have byte alignment (i.e. outside of
packed
types) and even when runtime safety is disabled.This ensures
valid
values by hiding the actualinvalid
bit pattern, potentially hiding semantic errors. I feel like this is also the wrong default, and worth reconsidering.Update (2023-06-26): Sentinel-terminated arrays
[n:s]T
are also an interesting case to also consider in this context. (I can't remember if they were in the language when I wrote this proposal originally.)@sizeOf([n:s]T) == @sizeOf([n+1]T)
sparse
types (only ifT
is), they are also vulnerable toinvalid
values when reinterpreting memory[n+1]T
=>[n:s]T
.@as([n:s]T, undefined)
should correctly establish the sentinel element or not.In my personal opinion it should, since it's a type invariant (similar to an unused higher byte in a padded integer), and only the other elements ("effective information") should be
undefined
.The text was updated successfully, but these errors were encountered: