-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Specification of behavior for safety-disabled bare unions #20232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I didn't expect the constructed type's size to change at all based on the creation scope's runtime safety, interesting that it does.
I understand your reasoning, and clearly you've put a lot of thought into this. What's interesting though is that while (There's also |
This is a large part of my reasoning, yes. The use of a bare
The usual: you can't put 'fancy' Zig types into something But there are occasions when you can know from program structure what's inside a union, without examining it, and I discovered the behavior which prompted this issue while writing one. One variant is a
I would put this differently I think.
This is the mechanism! It works just fine, and my interest is in making sure that future compiler development doesn't undermine the behavior. There's no getting around the tradeoff: without the tag, you can't safety-check the union. Having written some gnarly union-using code in C, being able to wear a seatbelt while getting it right is extremely helpful. Once the code is correct, the performance advantage of removing the tag is compelling, at least for my use case. For completeness, here's the other side of the equation: What happens if we generate an array of the safety-tagged bare union with runtime safety off, and then address the wrong variant, also with runtime safety off? This should be run as a postscript to the code in the first post: const checked_array_2 = blk: {
@setRuntimeSafety(false);
const temp: [2]CheckedUnion = .{ CheckedUnion{ .float = 2.7182 }, CheckedUnion{ .float = 3.14159 } };
break :blk temp;
};
fn runtimePassChecked2() []const CheckedUnion {
return &checked_array_2;
}
test "unsafe checked enum" {
@setRuntimeSafety(false);
const checked_union_slice = runtimePassChecked2();
// demonstrate that the check tag is present:
const ptr_0 = @intFromPtr(&checked_union_slice[0]);
const ptr_1 = @intFromPtr(&checked_union_slice[1]);
// size is still 16 bytes
std.debug.print("\noffset width of array is {}\n", .{ptr_1 - ptr_0});
// no runtime safety == undefined behavior:
std.debug.print("UNDEFINED BEHAVIOR {}\n", .{checked_union_slice[0].int});
// runtime safety == panic:
{
@setRuntimeSafety(true);
std.debug.print("time to panic! {}", .{checked_union_slice[0].int});
}
} Answer: the tag is still there, but the safety-checking code isn't generated. This is what I expected would happen, and this seems basically correct to me, I don't see how the compiler could do anything else and be in any way consistent. But this conclusively shows that the safety-tag itself is part of the type, and inherited from the runtime safety setting of the scope in which the type is defined. Without that, the compiler can't generate the safety check, so it doesn't, and conversely, it will generate the tag anyway with safety off, but not the code which checks that tag at runtime. As an aside, the ability to control safety on a block-by-block level is a stellar language feature. Even with no granularity (all checks are either on, or they're off), it's a level of safety control which hasn't been seen since Ada. I wouldn't mind the ability to flip those features on and off individually, but the status quo is already very good. |
Right, I agree that it works correctly and should continue to work this way. I really just think that additionally having special syntax, like It was just an idea though, if nobody else wants it then status-quo is seemingly good enough. EDIT: Per below comment, it's true this is really just about the safety, so maybe the keyword for this shouldn't be |
The issue I see with that is that the safety tag is, well, a safety feature, it doesn't affect the union's semantics unless there's an error in the code. So interacting with it through the safety system makes the most sense. Thanks for clarifying what you meant though. |
Who thought that having untagged unions be secretly tagged was a good idea? I think it's brilliant that Zig has tagged unions. Even something that allowed you to tag a union without having to write an enum, like:
would be lovely. But untagged should be untagged. It's great that there is a workaround that @mnemnion shows works, but it's pretty ugly! At least any functions defined inside the internal, really-untagged, union have safety on. I'm also implementing a VM and have my own way of tagging values, and will have millions of instances of my union, so doubling the size is a non-starter! As for why not |
Semantically, untagged is untagged; this is just a safety feature, and it's an incredibly helpful one. You're free to compile without runtime safety enabled if you want.
It isn't; if you mean If you have an issue with false dependency loops, feel free to open an issue. Please don't use this issue to argue against the existence of this language feature. |
Thanks for the quick response. Once problem with Unless I'm missing something, it isn't semantically neutral. If I have an array of the untagged union values, I can't look at them... they don't have a tag, so I can't switch on them, and I can't even pub const Code = union {
int: i64,
uint: u64,
object: Object,
};
...
for (&self.code) |*c| {
if (c.object.isIndexSymbol0()) {
const index = c.object.indexNumber();
c.* = Code.objectOf(replacements[index]); and I thought, from your assertion, that I could perhaps do something like: for (&self.code) |*c| {
switch (c.*) {
.object => |o|
if (o.isIndexSymbol0()) { but I get:
I wasn't arguing that this is related to dependency loops... simply that untagged unions don't seem useful if they are secretly tagged. I see only 3 non-test uses in the zig library: Build/Step/CheckObject.zig, Target.zig, and zig/Zir.zig (my search may be incomplete) so I wonder what the use-case is. I'll file an issue on dependency loops, although #16932 pretty much captures it. |
@dvmason Code is supposed to treat untagged unions as untagged. const U = union {u: u8, i: i8};
var u: U = .{.u = 200};
//all access to the field it was last overwritten with
u.u += 20;
f(&u.u);
// accessing u.i here is illegal behavior
u = .{.i = -80};
g(&u.i);
// accessing u.u here is illegal behavior The secret tag in debug modes is strictly meant to help spot illegal behavior (= accesses to an inactive union field) at runtime. All that being said, I agree that EDIT: In your first code sample, you correctly access |
Thanks, @rohlem, I understand the semantics, and your answer is very clear. I now understand this is all about Zig's intention to safety-check as much Undefined Behaviour as possible in safe compilation modes. Anyone coming from another language (like C or Rust) would likely find the fact that there are tagged unions a nice feature, but would assume that untagged unions would be just that. However, accessing such fields is Undefined Behaviour (as Rust also acknowledges) and therefore, Zig must safety-check it if possible. This gives me a zero-cost work-around, which is to have a function that does the unsafe operation: pub inline fn asObject(self: Code) Object {
@setRuntimeSafety(false);
return self.object;
} For me this is a cleaner solution than the one @mnemnion described, but their use-case may be different, and it's nice that both work. From my looking at the library and its rare use of untagged unions (15 occurrences (including 12 tests) vs 108 tagged (some tests)), and the fact that you can say Thanks again. |
@dvmason that function may not do what you need it to. It will eliminate the safety check, even in debug mode, but the safety check is very cheap and it's unlikely to be worthwhile to eliminate it while debugging. What it won't do is get rid of the debug tag, which Zig needs in order to insert the safety check to begin with. To do that requires creating the type inside a runtime-unsafe block, as illustrated in the first post. So the 'unsafe block' pattern you're proposing will result in the union still carrying the debug tag around, just, never using it (unless you don't use the accessor pattern). I don't consider creating the type within an unsafe block to be a 'workaround', to be clear, but rather, an important compromise between the goal of safety-checking all illegal behavior in appropriate modes, and the not-infrequent need for bare unions to be of an exact defined size. In the VM I've been working on (which prompted this issue), opcodes are a tagged union of instruction types, but embedded in a bare union where the other variant is a full So for proper function of the program I need to be able to turn off the debug tag most of the time, but want to keep other affordances like bounds checking active during development. An But this isn't the only use case for bare unions, and so the default should be to have the safety-checking available. I filed this issue because I feel strongly that the status quo strikes the right balance, and wanted that to be tracked. As well as, perhaps, documented better at some point, like many things Zig the documentation on this corner of the language is a bit thin at present. It's an open question how many use cases for bare unions can tolerate being 'secretly' wider than they otherwise would be, but the answer isn't none, and runtime assistance with designing code to use them properly is valuable. |
@mnemnion thanks for the comment. I have a similar situation to yours (writing a VM) but I have one particular place where I have an array of values (the |
An discussion on Ziggit about safety tagging and bare unions prompted me to open this issue, as a sounding board for specifying this corner of Zig's behavior.
Here's the motivating code:
This is with
0.12.0
, I'll check it with the new release within a few days.What's going on here is that with
@setRuntimeSafety(false)
, the type generated for the union doesn't have the safety tag, which is why it's smaller. This also illustrates that, at least for this test, the behavior of that type with@setRuntimeSafety(true)
is unchecked.Reading the documentation, I believe this is the only example of unchecked behavior escaping into a safety-checked context.
I feel strongly that this behavior is the correct one, and that the specification (when and as it is written) should make it clear that this combination of operations will have the result which it does: specifying both that using an unchecked union correctly in a safety checked scope will produce correct results, and also specifying that a bare union created in an unchecked scope will exhibit undefined behavior if misused in either a checked or unchecked scope.
As I discuss in the Discourse thread, I'm working on a VM which will rely on this working. I don't want to have to build the entire library in a Fast/Small mode in order to make the union smaller, and disable the runtime check, because it will happen on the dispatch of every instruction. VM dispatch loops have a way of confusing the branch predictor, so I can't count on that switch being cold, and leaving the safety tag on would mean that each instruction was
usize * 2
in stride, so I get half as many instructions per cache line.In other words, not only is this the only case where unchecked behavior can escape into a checked context, but it's good that it does, and its very important that later changes to the compiler don't change this behavior.
How this is expressed in the standard is less important: I would describe the current behavior as "the safety-checked status of a union type is a property of that type, which it inherits from the safety-check status of the scope in which it's defined".
The text was updated successfully, but these errors were encountered: