Skip to content

Specification of behavior for safety-disabled bare unions #20232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mnemnion opened this issue Jun 8, 2024 · 11 comments
Open

Specification of behavior for safety-disabled bare unions #20232

mnemnion opened this issue Jun 8, 2024 · 11 comments
Milestone

Comments

@mnemnion
Copy link

mnemnion commented Jun 8, 2024

An discussion on Ziggit about safety tagging and bare unions prompted me to open this issue, as a sounding board for specifying this corner of Zig's behavior.

Here's the motivating code:

// safety checked union
pub const CheckedUnion = union {
    int: i64,
    float: f64,
};

// unchecked union
pub const UncheckedUnion = blk: {
    @setRuntimeSafety(false);
    break :blk union {
        int: i64,
        float: f64,
    };
};

const unchecked_array: [1]UncheckedUnion = .{UncheckedUnion{ .float = 3.14139 }};
const checked_array: [1]CheckedUnion = .{CheckedUnion{ .float = 2.71828 }};

// evade compile-time safety checking
fn runtimePassUnchecked() []const UncheckedUnion {
    return &unchecked_array;
}
fn runtimePassChecked() []const CheckedUnion {
    return &checked_array;
}

test "safe and unsafe" {
    @setRuntimeSafety(true);
    std.debug.print("\nsize of CheckedUnion is {}\n", .{@sizeOf(CheckedUnion)});
    std.debug.print("size of UncheckedUnion is {}\n", .{@sizeOf(UncheckedUnion)});
    // illustrates that behavior is correct for correct access
    const union_int = UncheckedUnion{ .int = 64 };
    std.debug.print("value of union_int is {}\n", .{union_int.int});
    const union_float = UncheckedUnion{ .float = 3.14159 };
    std.debug.print("value of union_float is {}\n", .{union_float.float});
    const unchecked_union_slice = runtimePassUnchecked();
    // This is VERY ILLEGAL and is meant to illustrate the actual behavior,
    // which is that the program doesn't panic, and prints nonsense
    std.debug.print("UNDEFINED BEHAVIOR {}\n", .{unchecked_union_slice[0].int});
    const checked_union_slice = runtimePassChecked();
    // panic: access of union field 'int' while field 'float' is active
    std.debug.print("panic! at the runtime {}\n", .{checked_union_slice[0].int});
}

This is with 0.12.0, I'll check it with the new release within a few days.

What's going on here is that with @setRuntimeSafety(false), the type generated for the union doesn't have the safety tag, which is why it's smaller. This also illustrates that, at least for this test, the behavior of that type with @setRuntimeSafety(true) is unchecked.

Reading the documentation, I believe this is the only example of unchecked behavior escaping into a safety-checked context.

I feel strongly that this behavior is the correct one, and that the specification (when and as it is written) should make it clear that this combination of operations will have the result which it does: specifying both that using an unchecked union correctly in a safety checked scope will produce correct results, and also specifying that a bare union created in an unchecked scope will exhibit undefined behavior if misused in either a checked or unchecked scope.

As I discuss in the Discourse thread, I'm working on a VM which will rely on this working. I don't want to have to build the entire library in a Fast/Small mode in order to make the union smaller, and disable the runtime check, because it will happen on the dispatch of every instruction. VM dispatch loops have a way of confusing the branch predictor, so I can't count on that switch being cold, and leaving the safety tag on would mean that each instruction was usize * 2 in stride, so I get half as many instructions per cache line.

In other words, not only is this the only case where unchecked behavior can escape into a checked context, but it's good that it does, and its very important that later changes to the compiler don't change this behavior.

How this is expressed in the standard is less important: I would describe the current behavior as "the safety-checked status of a union type is a property of that type, which it inherits from the safety-check status of the scope in which it's defined".

@rohlem
Copy link
Contributor

rohlem commented Jun 8, 2024

I didn't expect the constructed type's size to change at all based on the creation scope's runtime safety, interesting that it does.

I feel strongly that this behavior is the correct one

I understand your reasoning, and clearly you've put a lot of thought into this.
I still think it might be unexpected in some cases, say you move a type out of a function and inadvertently add or remove runtime safety at every usage site.
But I guess it's really intended this way by the langref.
Someone wanting to force runtime safety should be using union(enum) anyway.

What's interesting though is that while union(enum) exists for forcing the tag,
for untagged unions (regardless of safety setting) status-quo only provides extern union.
I assume you prefer the block + @setRuntimeSafety(false) in the example because extern union has some drawbacks in your use case? (Probably something related to some system-C-ABI restrictions extern enforces?)
Should we provide a mechanism for forcing a union to be untagged that isn't tied to extern?

(There's also packed union, but packed comes with its own restrictions,
and will additionally require a wrapper packed struct to pad all types to the same size, see #19754 .)

@Vexu Vexu added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Jun 8, 2024
@Vexu Vexu added this to the 0.14.0 milestone Jun 8, 2024
@mnemnion
Copy link
Author

mnemnion commented Jun 8, 2024

Someone wanting to force runtime safety should be using union(enum) anyway.

This is a large part of my reasoning, yes. The use of a bare union is harmful unless the code is intended to be correct without the safety tag, and this implies an intention to turn off the safety check at some point. If you don't, you have a tag following every instance of the type around, getting checked on access, but you can't use the safety tag. The only way it can help you (and it's very helpful!) is by crashing if the code is wrong.

I assume you prefer the block + @setRuntimeSafety(false) in the example because extern union has some drawbacks in your use case?

The usual: you can't put 'fancy' Zig types into something extern, so it's limited in that way. Zig bare union types are even more opaque than extern union, because with no defined memory layout, there's no standard-compliant way of examining a specific part of the punned memory to see which type of union you're dealing with. You can sneak info out of a C union because every variant has a defined layout, you know where all the bits are. Any attempt to do the same with a Zig union would be undefined behavior, brittle, dangerous, and a terrible idea.

But there are occasions when you can know from program structure what's inside a union, without examining it, and I discovered the behavior which prompted this issue while writing one. One variant is a u64, and the other is a complex tagged union(enum) which fits in a machine word. It's for VM instructions, and proper function of the VM will never try and decode the bare variant, which is a bitmask provided for other instructions to use (which advance the instruction pointer past that bitmask).

for untagged unions (regardless of safety setting) status-quo only provides extern union.

I would put this differently I think. extern union can't be safety-tagged, because it has to follow the C ABI contract, Zig unions are safety-tagged in safety-checked because it's possible.

Should we provide a mechanism for forcing a union to be untagged that isn't tied to extern?

This is the mechanism! It works just fine, and my interest is in making sure that future compiler development doesn't undermine the behavior. There's no getting around the tradeoff: without the tag, you can't safety-check the union. Having written some gnarly union-using code in C, being able to wear a seatbelt while getting it right is extremely helpful. Once the code is correct, the performance advantage of removing the tag is compelling, at least for my use case.

For completeness, here's the other side of the equation: What happens if we generate an array of the safety-tagged bare union with runtime safety off, and then address the wrong variant, also with runtime safety off?

This should be run as a postscript to the code in the first post:

const checked_array_2 = blk: {
    @setRuntimeSafety(false);
    const temp: [2]CheckedUnion = .{ CheckedUnion{ .float = 2.7182 }, CheckedUnion{ .float = 3.14159 } };
    break :blk temp;
};

fn runtimePassChecked2() []const CheckedUnion {
    return &checked_array_2;
}

test "unsafe checked enum" {
    @setRuntimeSafety(false);
    const checked_union_slice = runtimePassChecked2();
    // demonstrate that the check tag is present:
    const ptr_0 = @intFromPtr(&checked_union_slice[0]);
    const ptr_1 = @intFromPtr(&checked_union_slice[1]);
    // size is still 16 bytes
    std.debug.print("\noffset width of array is {}\n", .{ptr_1 - ptr_0});
     // no runtime safety == undefined behavior:
    std.debug.print("UNDEFINED BEHAVIOR {}\n", .{checked_union_slice[0].int});
    // runtime safety == panic:
    {
        @setRuntimeSafety(true);
        std.debug.print("time to panic! {}", .{checked_union_slice[0].int});
    }
}

Answer: the tag is still there, but the safety-checking code isn't generated. This is what I expected would happen, and this seems basically correct to me, I don't see how the compiler could do anything else and be in any way consistent.

But this conclusively shows that the safety-tag itself is part of the type, and inherited from the runtime safety setting of the scope in which the type is defined. Without that, the compiler can't generate the safety check, so it doesn't, and conversely, it will generate the tag anyway with safety off, but not the code which checks that tag at runtime.

As an aside, the ability to control safety on a block-by-block level is a stellar language feature. Even with no granularity (all checks are either on, or they're off), it's a level of safety control which hasn't been seen since Ada. I wouldn't mind the ability to flip those features on and off individually, but the status quo is already very good.

@rohlem
Copy link
Contributor

rohlem commented Jun 8, 2024

Should we provide a mechanism for forcing a union to be untagged that isn't tied to extern?

This is the mechanism!

Right, I agree that it works correctly and should continue to work this way.

I really just think that additionally having special syntax, like untagged union { ... } would be clearer to read and easier to write
than status-quo u: {@setRuntimeSafety(false); break :u union { ... };}.
Also less prone to accidentally overlook a @setRuntimeSafety when moving code around.

It was just an idea though, if nobody else wants it then status-quo is seemingly good enough.

EDIT: Per below comment, it's true this is really just about the safety, so maybe the keyword for this shouldn't be untagged but more akin to unchecked.

@mnemnion
Copy link
Author

mnemnion commented Jun 8, 2024

The issue I see with that is that the safety tag is, well, a safety feature, it doesn't affect the union's semantics unless there's an error in the code. So interacting with it through the safety system makes the most sense. Thanks for clarifying what you meant though.

@dvmason
Copy link

dvmason commented Jan 18, 2025

Who thought that having untagged unions be secretly tagged was a good idea?

I think it's brilliant that Zig has tagged unions. Even something that allowed you to tag a union without having to write an enum, like:

const tagged = union () {a:u64,b:i64};

would be lovely. But untagged should be untagged.

It's great that there is a workaround that @mnemnion shows works, but it's pretty ugly! At least any functions defined inside the internal, really-untagged, union have safety on.

I'm also implementing a VM and have my own way of tagging values, and will have millions of instances of my union, so doubling the size is a non-starter!

As for why not extern union? I am currently trying to find a way around erroneous flagging of "dependency loops" and I hoped that extern might be the source of it, or possibly callconv so I am trying to remove the extern annotation everywhere. Fortunately I only have 2 unions, so this substitution isn't too painful.

@mlugg
Copy link
Member

mlugg commented Jan 18, 2025

But untagged should be untagged.

Semantically, untagged is untagged; this is just a safety feature, and it's an incredibly helpful one. You're free to compile without runtime safety enabled if you want.

As for why not extern union? I am currently trying to find a way around erroneous flagging of "dependency loops" and I hoped that extern might be the source of it [...]

It isn't; if you mean extern union, write extern union.

If you have an issue with false dependency loops, feel free to open an issue.


Please don't use this issue to argue against the existence of this language feature.

@dvmason
Copy link

dvmason commented Jan 18, 2025

Thanks for the quick response. Once problem with extern is that it is infectious... if I declare this, then everything that uses it has to be extern, etc.

Unless I'm missing something, it isn't semantically neutral. If I have an array of the untagged union values, I can't look at them... they don't have a tag, so I can't switch on them, and I can't even @bitCast them.
I have:

pub const Code = union {
        int: i64,
        uint: u64,
        object: Object,
};
 ...
          for (&self.code) |*c| {
                if (c.object.isIndexSymbol0()) {
                    const index = c.object.indexNumber();
                    c.* = Code.objectOf(replacements[index]);

and I thought, from your assertion, that I could perhaps do something like:

          for (&self.code) |*c| {
                switch (c.*) {
                    .object => |o|
                        if (o.isIndexSymbol0()) {

but I get:

execute.zig:495:26: error: switch on union with no attached enum
execute.zig:300:16: note: consider 'union(enum)' here

I wasn't arguing that this is related to dependency loops... simply that untagged unions don't seem useful if they are secretly tagged. I see only 3 non-test uses in the zig library: Build/Step/CheckObject.zig, Target.zig, and zig/Zir.zig (my search may be incomplete) so I wonder what the use-case is.

I'll file an issue on dependency loops, although #16932 pretty much captures it.

@rohlem
Copy link
Contributor

rohlem commented Jan 18, 2025

untagged unions don't seem useful if they are secretly tagged. [...] I wonder what the use-case is.

@dvmason Code is supposed to treat untagged unions as untagged.
That means that the tag information is managed at some other point (maybe information in one or several outside fields),
but every access to the union is to the union field that was previously written to the union:

const U = union {u: u8, i: i8};
var u: U = .{.u = 200};

//all access to the field it was last overwritten with
u.u += 20;
f(&u.u);
// accessing u.i here is illegal behavior

u = .{.i = -80};
g(&u.i);
// accessing u.u here is illegal behavior

The secret tag in debug modes is strictly meant to help spot illegal behavior (= accesses to an inactive union field) at runtime.

All that being said, I agree that extern union is restrictive, and often not the right tool for the job - it depends on what your use case actually requires.
If you need/want to reinterpret memory, I would probably suggest an underlying untyped buffer [@max(@sizeOf(A), @sizeOf(B))]u8 and using @bitCast for the reinterpretation.
(Note that with union, Zig gives no guarantees what state unused bits are in. With a manual buffer, you're in full control: The bits will keep the state that you last wrote to them.)

EDIT: In your first code sample, you correctly access c.object - if you know that object is the active state, then that is the correct usage.
If you need to set the union to that state, assign the whole union (c.* = .{.object = ...}), because even writing to an inactive field (c.object = ...) is illegal.
The switch in the second sample is incorrect, because the type is declared as an untagged union. (If the compiler allowed this, then the code would break in optimized modes where the compiler didn't include the secret tag.)

@dvmason
Copy link

dvmason commented Jan 19, 2025

Thanks, @rohlem, I understand the semantics, and your answer is very clear. I now understand this is all about Zig's intention to safety-check as much Undefined Behaviour as possible in safe compilation modes.

Anyone coming from another language (like C or Rust) would likely find the fact that there are tagged unions a nice feature, but would assume that untagged unions would be just that. However, accessing such fields is Undefined Behaviour (as Rust also acknowledges) and therefore, Zig must safety-check it if possible.

This gives me a zero-cost work-around, which is to have a function that does the unsafe operation:

        pub inline fn asObject(self: Code) Object {
            @setRuntimeSafety(false);
            return self.object;
        }

For me this is a cleaner solution than the one @mnemnion described, but their use-case may be different, and it's nice that both work.

From my looking at the library and its rare use of untagged unions (15 occurrences (including 12 tests) vs 108 tagged (some tests)), and the fact that you can say union(enum) to have it infer the tag, I can frankly see little justification to have the compiler complications to safety-check this particular Undefined Behaviour but the code is already there, so it's a sunk cost. It won't affect me, as I'll just use the function above, which just costs 4 extra characters to type when accessing a unknown field. (And actually, most of my accesses are compliant anyway so I don't need to use this very often and, who knows, the safety check might actually catch a bug!)

Thanks again.

@mnemnion
Copy link
Author

@dvmason that function may not do what you need it to. It will eliminate the safety check, even in debug mode, but the safety check is very cheap and it's unlikely to be worthwhile to eliminate it while debugging.

What it won't do is get rid of the debug tag, which Zig needs in order to insert the safety check to begin with. To do that requires creating the type inside a runtime-unsafe block, as illustrated in the first post. So the 'unsafe block' pattern you're proposing will result in the union still carrying the debug tag around, just, never using it (unless you don't use the accessor pattern).

I don't consider creating the type within an unsafe block to be a 'workaround', to be clear, but rather, an important compromise between the goal of safety-checking all illegal behavior in appropriate modes, and the not-infrequent need for bare unions to be of an exact defined size.

In the VM I've been working on (which prompted this issue), opcodes are a tagged union of instruction types, but embedded in a bare union where the other variant is a full u64, used for bitmasking. I've found it rather useful to be able to diagnose errors by turning runtime safety back on, but that only works for a subset of the bit vector types, some of them are @bitCast as a slice to a RuneSet, which has no provision for the width of the opcodes being any greater than a u64.

So for proper function of the program I need to be able to turn off the debug tag most of the time, but want to keep other affordances like bounds checking active during development. An extern union would be a drag, since the great majority of the opcodes live in a tagged union, and those are incompatible.

But this isn't the only use case for bare unions, and so the default should be to have the safety-checking available. I filed this issue because I feel strongly that the status quo strikes the right balance, and wanted that to be tracked. As well as, perhaps, documented better at some point, like many things Zig the documentation on this corner of the language is a bit thin at present.

It's an open question how many use cases for bare unions can tolerate being 'secretly' wider than they otherwise would be, but the answer isn't none, and runtime assistance with designing code to use them properly is valuable.

@dvmason
Copy link

dvmason commented Jan 19, 2025

@mnemnion thanks for the comment.

I have a similar situation to yours (writing a VM) but I have one particular place where I have an array of values (the union in question) that I have to scan through, and at least right now, those were created by assigning different branches of the union, and it's not worth figuring out which kind it is, so my accessor function solves my problem while leaving the union untagged for production and possibly catching errors in other locations while in test mode.

@andrewrk andrewrk removed the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Feb 9, 2025
@andrewrk andrewrk modified the milestones: 0.14.0, 1.0.0 Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants