Skip to content

std.process.Child.run failing depending on Release options #21756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SHIPWECK opened this issue Oct 20, 2024 · 9 comments · Fixed by #22691
Closed

std.process.Child.run failing depending on Release options #21756

SHIPWECK opened this issue Oct 20, 2024 · 9 comments · Fixed by #22691
Labels
bug Observed behavior contradicts documented or intended behavior

Comments

@SHIPWECK
Copy link

Zig Version

0.14.0-dev.719+f21928657

Steps to Reproduce and Observed Behavior

Create a file bug.zig and paste the following code:

const std = @import("std");

pub fn main() !void {
    var alloc = std.heap.page_allocator;

    const run_result = try std.process.Child.run(.{
        .allocator = alloc,
        .argv = &.{ "zig", "version" },
    });

    defer {
        alloc.free(run_result.stderr);
        alloc.free(run_result.stdout);
    }

    std.debug.print("{s}", .{run_result.stdout});
}

test "Child.run" {
    var alloc = std.heap.page_allocator;

    const run_result = try std.process.Child.run(.{
        .allocator = alloc,
        .argv = &.{ "zig", "version" },
    });

    defer {
        alloc.free(run_result.stderr);
        alloc.free(run_result.stdout);
    }

    try std.testing.expectEqualStrings(
        "0.14.0-dev.719+f21928657\n",
        run_result.stdout,
    );
}

Commands and output are listed below:

PS> zig run bug.zig -ODebug
0.14.0-dev.719+f21928657
PS> zig run bug.zig -OReleaseSafe
0.14.0-dev.719+f21928657
PS> zig run bug.zig -OReleaseSmall
error:
PS> zig run bug.zig -OReleaseFast
error:
PS> zig test bug.zig -ODebug
All 1 tests passed.
PS> zig test bug.zig -OReleaseSafe
All 1 tests passed.
PS> zig test bug.zig -OReleaseSmall
All 1 tests passed.
PS> zig test bug.zig -OReleaseFast
1/1 bug.test.Child.run...FAIL ()
0 passed; 0 skipped; 1 failed.
error: the following test command failed with exit code 1:
C:\Users\____\AppData\Local\zig\o\f08007e9b8d201450c8ddf23221d5620\test.exe --seed=0x21fb9850

Expected Behavior

Identical output on all release modes on which should be the zig version: 0.14.0-dev.719+f21928657.

@SHIPWECK SHIPWECK added the bug Observed behavior contradicts documented or intended behavior label Oct 20, 2024
@rohlem
Copy link
Contributor

rohlem commented Oct 20, 2024

What windows version (and CPU architecture) did you test this on?

I ran the test on Windows 11 Home Version 21H2.
Using zig version 0.14.0-dev.1924+bdd3bc056, after adjusting the hardcoded version string,
the only error I could reproduce was the empty error: output using zig run -OReleaseSmall.
All other combinations/commands worked without issue.

Can you maybe check with a newer nightly build whether other configurations still fail on your system?

@SHIPWECK
Copy link
Author

SHIPWECK commented Oct 21, 2024

I'm running Windows 11 home Version 23H2 on a 64-bit operating system, x64-based processor. I'll upgrade to a newer build and run the same tests. I will edit this message when I do.

Edit:
Using 0.14.0-dev.1951+857383689, i get the same results as you. Only -OReleaseSmall leads to the empty error: . In addition, all the zig tests I ran also work properly.

This is my first bug report, so I don't really know what to do now. Please advise.

@squeek502
Copy link
Collaborator

squeek502 commented Oct 21, 2024

I can also reproduce the zig run -OReleaseSmall bug on Windows 10. It doesn't seem to be related to zig run, specifically, as zig build-exe and then running the compiled exe still reproduces the bug, too.

Both the ReleaseSmall and ReleaseFast bugs with zig run also reproduce with Zig 0.13.0 so this doesn't seem to be a recent regression.

EDIT: Compiling with -ferror-tracing and -fno-strip gives this error trace, don't know what to make of it:

> zig run 21756.zig -OReleaseSmall -ferror-tracing -fno-strip
error:
C:\Users\Ryan\Programming\Zig\zig\lib\std\start.zig:439:53: 0xfc11f5 in WinStartup (21756.exe.obj)
    std.os.windows.ntdll.RtlExitUserProcess(callMain());
                                                    ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\start.zig:439:53: 0xfc120d in WinStartup (21756.exe.obj)
    std.os.windows.ntdll.RtlExitUserProcess(callMain());
                                                    ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\start.zig:439:53: 0xfc1215 in WinStartup (21756.exe.obj)
    std.os.windows.ntdll.RtlExitUserProcess(callMain());
                                                    ^

My first thought is that there's a miscompilation involved.

@SHIPWECK
Copy link
Author

It's the bare minimum error trace, basically saying that it was main that caused the error (no duh). The thing that makes this really sucky is that you don't get the error messages of debug mode because it works in debug.

I did some more experimenting using print debugging and narrowed down the issue to this function: std.process.Child.collectOutput. You can search the function in the standard, but the code seems to fail in-between infallible statements. Below is the standard library code and some of the print statements I used.

// ... std.process.Child.collectOutput
assert(child.stdout_behavior == .Pipe);
assert(child.stderr_behavior == .Pipe);

// my changes
std.debug.print("trying to assert that allocators are the same\n", .{});
// end my changes

// we could make this work with multiple allocators but YAGNI
if (stdout.allocator.ptr != stderr.allocator.ptr or
    stdout.allocator.vtable != stderr.allocator.vtable)
{
    // my changes
    std.debug.print("assertion failed, unreachable code reached\n", .{});
    // end my changes
    unreachable; // ChildProcess.collectOutput only supports 1 allocator
}

// my changes
std.debug.print("assertion passed, continuing...\n", .{});
// end my changes

 var poller = std.io.poll(stdout.allocator, enum { stdout, stderr }, .{
    .stdout = child.stdout.?,
    .stderr = child.stderr.?,
    });
// ... std.process.Child.collectOutput

For some reason, this program prints "trying to assert that allocators are the same" and then exits, without printing "assertion failed, unreachable code reached". I really don't know what to make of this, since it seems the unreachable isn't being reached, but the assertions preceding it succeed. The code seems to fail in-between fallible statements.

@squeek502
Copy link
Collaborator

Some more strange info:

  • Adding a std.debug.print call to collectOutput causes ReleaseSafe/ReleaseFast to start hitting that unreachable for me
  • Putting a print within the unreachable branch does get printed for me when the mode is ReleaseSafe/ReleaseFast (if there is also a print call before the conditional)
  • Seems to be partially related to the use of std.heap.page_allocator; the same test program using a GeneralPurposeAllocator instead does not fail in any release mode
  • The values of stdout.allocator/stderr.allocator look equal if you print them out
  • The result of the conditions themselves are both false if your print them out, so there's no obvious reason why the branch would be taken

My naive guess is one of:

  • Unrelated miscompilation that happens to be manifesting in this particular way
  • Somehow the optimizer is evaluating that particular conditional as inherently true at compile-time and therefore optimizing based on that (meaning it's treating that entire function as unreachable basically)
  • Stack corruption is occurring in some way due to a bug in the standard library code

I don't have enough experience with these types of bugs to be able to differentiate between those possibilities, though.

@rohlem
Copy link
Contributor

rohlem commented Oct 21, 2024

std.log.err("{x} != {x}", .{@intFromPtr(stdout.allocator.ptr), @intFromPtr(stderr.allocator.ptr)}); before that if prints error: 6 != 3 for me in ReleaseSmall, so the allocator pointer's aren't the same (and are suspiciously close to 0/null).

For print debugging in general, I don't think that a branch that includes unreachable has to do sensible debug printing in a safety-less build mode such as ReleaseSmall.
However, if you replace unreachable with a @panic call it should reliably print the text argument, yet the error still reproduces, so it's definitely a distinct cause from that unreachable.

Update: @alignOf(@TypeOf(stdout.allocator.ptr)) is 8, therefore the pointer values 3 and 6 are invalid (they should have to be a multiple of 8).
The invalid values in the comparison probably lead to the crash, and because that's an invalid state, the rest of the logic gets corrupted too.

Update2: Since the ptr value of std.heap.page_allocator seems to be initialized with undefined (as it's unused), the underlying issue might be undefined leading to an invalid value for the destination type.
I'm pretty sure there is already an issue for this, searching.

@squeek502
Copy link
Collaborator

squeek502 commented Oct 21, 2024

Since the ptr value of std.heap.page_allocator seems to be initialized with undefined (as it's unused), the underlying issue might be undefined leading to an invalid value for the destination type.

I think you figured it out. Setting page_allocator.ptr to a value that's not undefined mitigates the problem. Perhaps those global Allocator instances should set ptr to something more like the value returned from Allocator.alloc when the length is 0:

const ptr = comptime std.mem.alignBackward(usize, math.maxInt(usize), Slice.alignment);
return @as([*]align(Slice.alignment) T, @ptrFromInt(ptr))[0..0];

Maybe something like:

@ptrFromInt(std.mem.alignBackward(usize, std.math.maxInt(usize), @alignOf(*anyopaque)))

@rohlem
Copy link
Contributor

rohlem commented Oct 21, 2024

Related other issues:

Basically, the definition intends to express an undefined but valid value for a given target/result type,
which is different from an undefined value of unknown origin (and undefined in status-quo), which may hold any bit pattern, and leads to illegal behavior in more contexts.
(EDIT: Alternatively we could declare that the use case of comparing allocator pointers is illegal and unsupported for the allocator interface, in which case it's the fault of collectOutput.)
There exist special-purpose solutions like std.mem.alignBackwards as suggested above, though I would personally prefer a general solution (/ more safety) provided by the language.
In status-quo, the coercion of undefined to any sparse type (with invalid bit patterns) makes any and all usages of the resulting value unsound.

@squeek502
Copy link
Collaborator

squeek502 commented Oct 21, 2024

Note that there might be more going on than just invalid bit patterns, since if I add print calls in various places, the value of the ptr field constantly changes:

in main: 00
start of Child.run: 480
after ChildProcess.init call: 110
after stdout/stderr init, args.allocator.ptr = 100
 stdout.allocator.ptr = 2
 stderr.allocator.ptr = 26ddcaa0000
in collectOutput 11 != 7fff5213b44d

Isn't the comparison of ptr in collectOutput always unsafe/illegal if ptr is undefined?

EDIT:

Alternatively we could declare that the use case of comparing allocator pointers is illegal and unsupported for the allocator interface, in which case it's the fault of collectOutput.

Yeah, seems like the bug is either the ptr = undefined or the ptr != ptr. Could be mitigated by changing either one.

squeek502 added a commit to squeek502/zig that referenced this issue Oct 21, 2024
The use of undefined here meant that it was always unsafe/illegal to compare the ptr fields of Allocator/Random. However, this restriction is not mentioned anywhere and the ptr field *is* being compared in real code, e.g. `std.process.Child.collectOutput`, which can lead to undefined behavior in release modes (see ziglang#21756).

There are two ways to address this:
1. Codify that comparing `ptr` is always illegal/unsafe, and remove all existing comparisons.
2. Set `ptr` to a legal dummy value, similar to the value returned by Allocator.alloc when a zero-length slice is requested.

The first option seems like footgun central (at least until usage of `undefined` is fully safety-checked in safe modes), so I went with the second option.

Closes ziglang#21756
squeek502 added a commit to squeek502/zig that referenced this issue Oct 21, 2024
The use of undefined here meant that it was always unsafe/illegal to compare the ptr fields of Allocator/Random. However, this restriction is not mentioned anywhere and the ptr field *is* being compared in real code, e.g. `std.process.Child.collectOutput`, which can lead to undefined behavior in release modes (see ziglang#21756).

There are two ways to address this:
1. Codify that comparing `ptr` is always illegal/unsafe, and remove all existing comparisons.
2. Set `ptr` to a legal dummy value, similar to the value returned by Allocator.alloc when a zero-length slice is requested.

The first option seems like footgun central (at least until usage of `undefined` is fully safety-checked in safe modes), so I went with the second option.

Closes ziglang#21756
squeek502 added a commit to squeek502/zig that referenced this issue Nov 25, 2024
… instances

Previously, Allocator implementations that did not store any state would set `ptr` to `undefined`. However, the use of undefined meant that it was always unsafe/illegal to compare the ptr fields of Allocator/Random. This restriction was not mentioned anywhere, though, and the ptr field *was* being compared in real code, e.g. `std.process.Child.collectOutput`, which can lead to undefined behavior in release modes (see ziglang#21756).

There are a few ways to address this:
1. Codify that comparing `ptr` is always illegal/unsafe, and remove all existing comparisons.
2. Set `ptr` to a legal dummy value, similar to the value returned by Allocator.alloc when a zero-length slice is requested.
3. Make `ptr` optional and set it to `null` for implementations that don't store any state

The first option seems like footgun central (at least until usage of `undefined` is fully safety-checked in safe modes), and the second option has its own issues since the dummy value could inadvertently be equal to a real mapped pointer, so the 3rd option was chosen.
squeek502 added a commit to squeek502/zig that referenced this issue Nov 25, 2024
… instances

Previously, Allocator implementations that did not store any state would set `ptr` to `undefined`. However, the use of undefined meant that it was always unsafe/illegal to compare the ptr fields of Allocator. This restriction was not mentioned anywhere, though, and the ptr field *was* being compared in real code, e.g. `std.process.Child.collectOutput`, which can lead to undefined behavior in release modes (see ziglang#21756).

There are a few ways to address this:
1. Codify that comparing `ptr` is always illegal/unsafe, and remove all existing comparisons.
2. Set `ptr` to a legal dummy value, similar to the value returned by Allocator.alloc when a zero-length slice is requested.
3. Make `ptr` optional and set it to `null` for implementations that don't store any state

The first option seems like footgun central (at least until usage of `undefined` is fully safety-checked in safe modes), and the second option has its own issues since the dummy value could inadvertently be equal to a real mapped pointer, so the 3rd option was chosen.
squeek502 added a commit to squeek502/zig that referenced this issue Nov 25, 2024
… instances

Previously, Allocator implementations that did not store any state would set `ptr` to `undefined`. However, the use of undefined meant that it was always unsafe/illegal to compare the ptr fields of Allocator. This restriction was not mentioned anywhere, though, and the ptr field *was* being compared in real code, e.g. `std.process.Child.collectOutput`, which can lead to undefined behavior in release modes (see ziglang#21756).

There are a few ways to address this:
1. Codify that comparing `ptr` is always illegal/unsafe, and remove all existing comparisons.
2. Set `ptr` to a legal dummy value, similar to the value returned by Allocator.alloc when a zero-length slice is requested.
3. Make `ptr` optional and set it to `null` for implementations that don't store any state

The first option seems like footgun central (at least until usage of `undefined` is fully safety-checked in safe modes), and the second option has its own issues since the dummy value could inadvertently be equal to a real mapped pointer, so the 3rd option was chosen.
squeek502 added a commit to squeek502/zig that referenced this issue Jan 30, 2025
… instances

Previously, Allocator implementations that did not store any state would set `ptr` to `undefined`. However, the use of undefined meant that it was always unsafe/illegal to compare the ptr fields of Allocator. This restriction was not mentioned anywhere, though, and the ptr field *was* being compared in real code, e.g. `std.process.Child.collectOutput`, which can lead to undefined behavior in release modes (see ziglang#21756).

There are a few ways to address this:
1. Codify that comparing `ptr` is always illegal/unsafe, and remove all existing comparisons.
2. Set `ptr` to a legal dummy value, similar to the value returned by Allocator.alloc when a zero-length slice is requested.
3. Make `ptr` optional and set it to `null` for implementations that don't store any state

The first option seems like footgun central (at least until usage of `undefined` is fully safety-checked in safe modes), and the second option has its own issues since the dummy value could inadvertently be equal to a real mapped pointer, so the 3rd option was chosen.
squeek502 added a commit to squeek502/zig that referenced this issue Jan 31, 2025
squeek502 added a commit to squeek502/zig that referenced this issue Jan 31, 2025
squeek502 added a commit to squeek502/zig that referenced this issue Jan 31, 2025
squeek502 added a commit to squeek502/zig that referenced this issue Feb 2, 2025
alexrp pushed a commit to squeek502/zig that referenced this issue Feb 3, 2025
squeek502 added a commit to squeek502/zig that referenced this issue Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
3 participants