-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Proposal: undefined
detection in safe/debug builds
#15585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think you've got exhaustive vs non-exhuastive mixed up in these two passages:
Also, in reference to:
This isn't entirely true. packed structs could follow the same rule as their backing integer (non powers of two or less than 8 bits). Same with packed union I imagine? Also, not explicitly mentioned in the proposal, but a thought on optional slices: could we represent an undefined |
Not sure how well this translates to other operating systems or embedded, but couldn't we set it to |
Yep, serves me right for writing a proposal under time pressure - edited.
Good point on the packed stuff. Extern types seem like they could use padding bits/bytes upon first glance, but this unfortunately falls apart because pointers exist. One reason we want to mark all struct fields separately is that if you take a pointer to a field and mutate it, you want to mark the field as defined. If you have one defined tag for the whole struct, that's not possible, because the pointer "loses" the information about the struct.
That seems reasonable to me - again, this is highly related to the optional optimisation stuff in #104.
That would probably work on most (all?) "proper" OSes - in fact the Zig compiler currently uses a trick like this for efficiently representing some internal datastructures! - but imo we don't really want to depend on details of the target OS in the language itself for features like this. Standard nullable pointers are actually kinda rare to see in Zig, so I don't think this is a huge loss. |
duplicate (concretization) of #211 |
(closing as duplicate) |
I have a question that is related to this issue (but not the linked dup one): As far as I know, Microsoft’s C compiler emits However, although the modern CPUs protects against executing data sections,
That is to say,
TL;DR: I want to propose we modify P.S.: Should I open a new issue about this idea? |
@m13253 make a new issue |
Motivation
In the status quo, all
undefined
bytes are set to0xAA
in safe builds. This is good for manual detection with a debugger, and is helpful in cases where all values are valid, such as power-of-two-sized integers: but in a lot of cases, we can do better. For a lot of types, certain values are invalid, so can be dedicated by the compiler as theundefined
value. This would allowundefined
to be correctly propagated through values in safe builds, and ultimately give us runtime safety checks for branching onundefined
. In particular, something like this might have allowed us to much more easily identify the CI failures which @jacobly0 just tracked down.Representations
Our goal here is to identify any type with an unused value which we can reserve to mean
undefined
.bool
can use a padding bit.A type
uX
oriX
whereX
is not power of two or is under 8 (i.e.X = 8, 16, 32, ...
) can use a padding bit.An exhuastive
enum
with unused tags can have a dummy tag added to representundefined
. A non-exhaustive enum defers to the above rule for its tag type.A
struct
should not itself be marked as undefined, but rather all of its fields (where possible) should be recursively marked as such. Aunion
orunion(enum)
can have its tag set to undefined where possible, or have an extra bit otherwise. There's nothing we can (consistently) do for apacked struct
,packed union
,extern struct
, orextern union
.An undefined array is equivalent to an array full of undefined values.
We can't do anything about "standard" (ABI-allowed) nullable pointers, but slices could be made larger if necessary. For non-nullable pointers and slices, we could use the null pointer value.
Other optionals can use a padding bit.
Error sets can use a special tag (maybe
maxInt(u16)
or similar) to representundefined
. Error unions could do the same.Vectors, like arrays, can set their elements to
undefined
, but of course this will only work for base types which are an int type with a non-power-of-two number of bits (at least 8).I don't know how async frames are represented, but we can surely just add an extra bit if necessary.
That leaves the following non-zero-bit runtime types which we can't represent
undefined
for:uX
/iX
forX = 8, 16, 32, 64, ...
f32
/f64
/f80
[*c]T
,?*T
,?[*]T
packed struct
,packed union
extern struct
,extern union
enum(T)
whereT
is one of the int types aboveThat's actually not bad! Most "interesting" types can represent
undefined
. Even if we don't want to increase the size of anything compared to today, that only excludes a few more types: some unions, nullable slices, and some non-pointer optionals. (Related to the last case: #104).Drawbacks
As noted above, doing this for as many types as possible would make several types take up more memory. But in addition, the extra checks on many operations could considerably slow down builds with runtime safety. For that reason, it might be worth considering only doing this in Debug builds (i.e. not in ReleaseSafe).
Another downside is that this could make noticing undefined values in a debugger slightly harder, since they will no longer necessarily be
0xAA
(although I think it makes sense to continue using that byte pattern where we can't have a specific representation). I'm not really familiar with how debug info works: perhaps we could represent these values in the debug info, and have our pretty-printers write them asundefined
?There's one last, slightly more subtle, drawback. Consider the code
var a: u32 = undefined; const b = a & 0;
. Here,b
is actually totally well-defined, but naive checks would mark it as being undefined. There are issues like this for all kinds of operations, such as@max(undefined_u32, runtime_zero)
,undef ^ undef
, etc. This gets worse when you consider that it's possible for only some bits to be undefined: for instance,undef & 1
gives a result where all but the LSB are defined. Comptime evaluation currently doesn't handle these cases either, and may well never do - but even if it does, handling them at runtime seems infeasible, since it would require storing the defined state of every single bit, effectively doubling the size of everything in memory. However, in practice, I don't think this is much of an issue: code like this is incredibly rare in reality, and if you were confident in your code's correctness, you could do something likeif (std.debug.runtime_safety) 0 else undefined
. Perhaps there could even be a helper function instd.mem
for this:The text was updated successfully, but these errors were encountered: