Skip to content

Packed struct padding bits can cause incorrect values to be observed #13480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cdwfs opened this issue Nov 7, 2022 · 12 comments
Closed

Packed struct padding bits can cause incorrect values to be observed #13480

cdwfs opened this issue Nov 7, 2022 · 12 comments
Labels
bug Observed behavior contradicts documented or intended behavior frontend Tokenization, parsing, AstGen, Sema, and Liveness. miscompilation The compiler reports success but produces semantically incorrect code.
Milestone

Comments

@cdwfs
Copy link

cdwfs commented Nov 7, 2022

Zig Version

0.10.0

Steps to Reproduce and Observed Behavior

I ran into this issue while converting my 2021 Advent of Code solutions from Zig 0.9.1 to 0.10.0. I originally reported the problem on Discord; see the thread for additional details.

Build the attached day08.zig source using zig build-exe day08.zig, and run the resulting day08.exe. The assert on line 108 will fire, indicating that the loop directly above the assert did not run correctly.

>day08.exe
thread 13716 panic: reached unreachable code
C:\users\Me\bin\zig\lib\std\debug.zig:278:14: 0x7ff70ff432c0 in assert (day08.exe.obj)
    if (!ok) unreachable; // assertion failure
             ^
C:/Users/Me/Downloads/day08.zig:108:15: 0x7ff70ff42452 in part2 (day08.exe.obj)
        assert(digit_index_for_numeral[9] >= 0 and digit_index_for_numeral[9] < 10);

Now make nearly any change to the surrounding code, such as uncommenting the load_bearing_bool definition directly above the loop on lines 95-96. Build and run again, and note that the loop now runs correctly; the assert does not fire.

Note that if -fstage1 is passed at build time to opt into the Bootstrap Compiler, the original unmodified code works as expected.
day08.zip

Expected Behavior

The assert on line 108 should not trigger, regardless of whether unrelated/unused variables are defined, or print() is called to inspect variable values, etc.

@cdwfs cdwfs added the bug Observed behavior contradicts documented or intended behavior label Nov 7, 2022
@cdwfs
Copy link
Author

cdwfs commented Nov 7, 2022

@SpexGuy poked at the source for a while and simplified it a bit further; the modified repro case is attached. The repro steps are basically the same.

This is a fun version. It asserts that:

  1. mask_a has one bit set
  2. mask_a has three trailing zeros
  3. mask_a is 0b0001000

1 and 2 pass, 3 fails xD
Like in yours, changing pretty much any line causes the bug to disappear.

day08-alt.zip

@Vexu
Copy link
Member

Vexu commented Nov 7, 2022

Cannot reproduce on latest master.

@cdwfs
Copy link
Author

cdwfs commented Nov 7, 2022

I just tested on 0.11.0-dev.86+b83e4d965 and can still reproduce the issue. If anything it's even weirder with this version: with the load_bearing_bool lines uncommented, the assert still fires, but only about 50% of the time (even just running the same day08.exe repeatedly). Without the lines uncommented, the assert still seems to fire 100% of the time.

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 7, 2022

I think both cdwfs and I are using Windows, it's possible that the bug only repros on that system. (though that would be somewhat strange since the program shouldn't use any OS APIs aside from VirtualAlloc). Given the instability of the bug, it's also possible that recent compiler changes have jostled things enough to mask it without fixing it.

@SpexGuy SpexGuy added the miscompilation The compiler reports success but produces semantically incorrect code. label Nov 7, 2022
@Vexu
Copy link
Member

Vexu commented Nov 7, 2022

Can't reproduce on 0.11.0-dev.86+b83e4d965 either.

though that would be somewhat strange since the program shouldn't use any OS APIs aside from VirtualAlloc

Can you try using a fixed buffer allocator?

@cdwfs
Copy link
Author

cdwfs commented Nov 8, 2022

@Vexu To clarify, when you say "can't reproduce", do you mean you don't see the assert firing at all? Or that it's firing consistently, whether any other changes are made to the file?

Can you try using a fixed buffer allocator?
Sure -- I replaced lines 183-185 (the arena allocator) with

var buffer: [1020*1024]u8 = undefined;
var fba = std.heap.FixedBufferAllocator.init(&buffer);
const allocator = fba.allocator();

and this does indeed seem to stop the assert from firing. But on the other hand, on my machine, so does defining an unrelated bool variable that's never used, or adding print() calls, or initializing the mask_* variables with their constant values directly instread of computing them at runtime. So I'm not sure how much I'd read into this :)

I feel like the appropriate next step is to look at the generated code for the part2() function with and without the "meaningless" changes, to see if anything looks suspicious. Is there a way to have Zig spit out that information directly?

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 8, 2022

This is really freaky. I just reproed the inconsistency you were mentioning. This is all the same executable:

Console log, collapsed
D:\Stuff $ day08.exe
thread 26852 panic: reached unreachable code
C:\Program Files\Zig\Current\lib\std\debug.zig:278:14: 0x7ff7942b38f0 in assert (day08.exe.obj)
    if (!ok) unreachable; // assertion failure
             ^
D:/Stuff/day08.zig:101:15: 0x7ff7942b24c9 in part2 (day08.exe.obj)
        assert(mask_a.mask == 0b0001000);
              ^
D:/Stuff/day08.zig:223:27: 0x7ff7942b1d35 in testPart2 (day08.exe.obj)
    var result:i64 = part2(test_input);
                          ^
D:/Stuff/day08.zig:231:18: 0x7ff7942b341d in main (day08.exe.obj)
    try testPart2(allocator);
                 ^
C:\Program Files\Zig\Current\lib\std\start.zig:377:41: 0x7ff7942b35e7 in WinStartup (day08.exe.obj)
    std.debug.maybeEnableSegfaultHandler();
                                        ^
???:?:?: 0x7fff2f707033 in ??? (???)
???:?:?: 0x7fff308e26a0 in ??? (???)

D:\Stuff $ day08.exe
a: 1000
result: 0

D:\Stuff $ day08.exe
thread 23048 panic: reached unreachable code
C:\Program Files\Zig\Current\lib\std\debug.zig:278:14: 0x7ff7942b38f0 in assert (day08.exe.obj)
    if (!ok) unreachable; // assertion failure
             ^
D:/Stuff/day08.zig:101:15: 0x7ff7942b24c9 in part2 (day08.exe.obj)
        assert(mask_a.mask == 0b0001000);
              ^
D:/Stuff/day08.zig:223:27: 0x7ff7942b1d35 in testPart2 (day08.exe.obj)
    var result:i64 = part2(test_input);
                          ^
D:/Stuff/day08.zig:231:18: 0x7ff7942b341d in main (day08.exe.obj)
    try testPart2(allocator);
                 ^
C:\Program Files\Zig\Current\lib\std\start.zig:377:41: 0x7ff7942b35e7 in WinStartup (day08.exe.obj)
    std.debug.maybeEnableSegfaultHandler();
                                        ^
???:?:?: 0x7fff2f707033 in ??? (???)
???:?:?: 0x7fff308e26a0 in ??? (???)

D:\Stuff $ day08.exe
a: 1000
result: 0

D:\Stuff $ day08.exe
a: 1000
result: 0

D:\Stuff $ day08.exe
thread 18692 panic: reached unreachable code
C:\Program Files\Zig\Current\lib\std\debug.zig:278:14: 0x7ff7942b38f0 in assert (day08.exe.obj)
    if (!ok) unreachable; // assertion failure
             ^
D:/Stuff/day08.zig:101:15: 0x7ff7942b24c9 in part2 (day08.exe.obj)
        assert(mask_a.mask == 0b0001000);
              ^
D:/Stuff/day08.zig:223:27: 0x7ff7942b1d35 in testPart2 (day08.exe.obj)
    var result:i64 = part2(test_input);
                          ^
D:/Stuff/day08.zig:231:18: 0x7ff7942b341d in main (day08.exe.obj)
    try testPart2(allocator);
                 ^
C:\Program Files\Zig\Current\lib\std\start.zig:377:41: 0x7ff7942b35e7 in WinStartup (day08.exe.obj)
    std.debug.maybeEnableSegfaultHandler();
                                        ^
???:?:?: 0x7fff2f707033 in ??? (???)
???:?:?: 0x7fff308e26a0 in ??? (???)

D:\Stuff $ day08.exe
a: 1000
result: 0

AFAIK the only things that could cause randomness in this program are

  • address space layout randomization
  • changes in pointer values returned by VirtualAlloc

So somehow the high bits of a pointer must be influencing the result.

I've attached the executable in case anyone wants to mess with it, I wonder if it repros under wine.
day08.zip

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 8, 2022

Figured it out. The problem is the top bit of the u7. Here's the relevant part of the disassembly:

; var mask_a = std.bit_set.IntegerBitSet(7){
;     .mask = display.digit_masks[digit_index_for_numeral[7]].mask ^ mask_cf.mask,
; };
    ; ...
    mov     cl, byte ptr [rsp+429h]
    and     cl, 80h
    or      al, cl
    mov     byte ptr [rsp+429h], al
; ...
; assert(mask_a.mask == 0b0001000);
    mov     al, byte ptr [rsp+429h]
    sub     al, 8
    sete    cl
    call    day08!assert (7ff7942b38c0)

When the u7 is written to memory, the write preserves the top bit for some reason. Then later when it's read, the top bit isn't masked off. So it gets whatever happened to be on the stack in the top bit, which I guess in this particular program happens to be a pointer, and memory space randomization causes it to sometimes be 1 and sometimes 0. I'll see if I can come up with a more consistent repro.

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 8, 2022

Here's the smallest repro I can find. It seems to require indexing an array of packed structs with padding.

const BitSet = packed struct {
    mask: u7,
};

const Display = struct {
    // This array has to be large.  Using separate padding
    // plus a one item array doesn't work.
    digit_masks: [10]BitSet = undefined,
};

fn testPart2() void {
    var display = Display{};
    display.digit_masks[9] = .{ .mask = 0b0001000 };

    smash();
    stackUp(display);
}

fn smash() void {
    var stack: [0x500]u8 = undefined;
    @memset(&stack, 0xFF, stack.len);
}

fn stackUp(display: Display) void {
    var space: [0x100]u8 = undefined;
    part2(display);
    _ = space;
}

fn part2(display: Display) void {
    // var mask_a = display.digit_masks[9] works without problems
    var mask_a = BitSet{
        .mask = display.digit_masks[9].mask,
    };
    if (@popCount(mask_a.mask) != 1) unreachable; // this passes
    if (@ctz(mask_a.mask) != 3) unreachable; // this passes
    if (mask_a.mask != 0b0001000) unreachable; // but this fails
}

pub fn main() void {
    testPart2();
}

Repros on godbolt: https://zig.godbolt.org/z/9sY3WjdWs

Edit: It only repros sometimes on godbolt. If it doesn't work, reload the page a few times.

@SpexGuy SpexGuy added the frontend Tokenization, parsing, AstGen, Sema, and Liveness. label Nov 8, 2022
@SpexGuy SpexGuy changed the title Mysterious behavior changes when adding/removing unrelated code (stage2 compiler only) Packed struct padding bits can cause incorrect values to be observed Nov 8, 2022
@Vexu Vexu added this to the 0.10.1 milestone Nov 8, 2022
@Vexu
Copy link
Member

Vexu commented Nov 23, 2022

Due to the integer backed packed struct semantics this is actually the same issue as #11263

Vexu added a commit to Vexu/zig that referenced this issue Nov 23, 2022
Vexu added a commit to Vexu/zig that referenced this issue Nov 23, 2022
Vexu added a commit to Vexu/zig that referenced this issue Nov 25, 2022
Initializing packed structs with result location is problematic
since storing to a packed struct field pointer involves loading
the field first which can cause loading uninitialized memory unless
specifically zeroed first. Since packed structs are integer backed
now using aggregate init makes more sense and avoids this issue.

Closes ziglang#13480
Vexu added a commit to Vexu/zig that referenced this issue Nov 25, 2022
Initializing packed structs with result location is problematic
since storing to a packed struct field pointer involves loading
the field first which can cause loading uninitialized memory unless
specifically zeroed first. Since packed structs are integer backed
now using aggregate init makes more sense and avoids this issue.

Closes ziglang#13480
@andrewrk andrewrk modified the milestones: 0.10.1, 0.11.0 Jan 10, 2023
@Validark
Copy link
Contributor

Validark commented Jun 4, 2023

Not sure if this is the same issue @Vexu but I just ran into this:

const std = @import("std");

const Node = packed struct {
    const @"c/d" = packed struct(u40) {
        c: u32 = 0,
        d: u8 = 0,
    };

    const @"a/b" = packed struct(u40) {
        a: u32 = 0,
        b: u8 = 0,
    };

    @"a/b": @"a/b" = .{},
    @"c/d": @"c/d" = .{},

    pub fn init(d: u8) Node {
        return Node{
            // These fields work okay
            .@"a/b" = .{ .a = d, .b = d + 1 },
            // Writing to either of these fields will not work:
            .@"c/d" = .{ .c = d, .d = d + 1 },
        };
    }
};

pub fn main() !void {
    const node = Node.init(9);
    std.debug.print("{any}\n", .{node});
}

@jacobly0
Copy link
Member

jacobly0 commented Jul 29, 2023

Closing as a duplicate of the newer #14200 because I just so happened to fix that one first. @SpexGuy's explanation perfectly matches the issue I fixed and that reduction crashes without my fix and succeeds with my fix.

@Validark your issue is much more similar to #14305 than this issue (related to packed types inside other packed types, not inside unpacked types) nevermind, it seems unique #16609

@jacobly0 jacobly0 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior frontend Tokenization, parsing, AstGen, Sema, and Liveness. miscompilation The compiler reports success but produces semantically incorrect code.
Projects
None yet
Development

No branches or pull requests

6 participants