-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Integer coercion consideres non-type bits #11263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Seems like "intended" behavior to me. To print an int you have to branch on it, and branching on undefined is undefined behavior, so anything could happen. |
Basically, I'm not sure if I understand why you expect the |
This isn't actually a problem that comes from undefined. It also comes from very defined values: const std = @import("std");
pub fn main() !void {
var value: [2]u4 = undefined;
std.mem.set(u8, @ptrCast([*]u8, &value)[0..@sizeOf([2]u4)], 0xFF);
std.debug.print("u4={}, usize={}\n", .{
value[0],
@as(usize, value[0]),
});
// prints u4=15, usize=255
}
When coercing any We either have to define padding bits of non-byte aligned integers to always be zero (then this issue wouldn't be a bug) or we have to fix the compiler implementation to ignore these bits (then this issue would be a bug). Right now, cc @SpexGuy |
@MasterQ32 both your code snippets are invoking UB with the current stage1 compiler implementation and spec defined by that implementation. Relevant bits from the llvm language reference: On load IR instructions:
https://llvm.org/docs/LangRef.html#load-instruction On store IR instructions:
https://llvm.org/docs/LangRef.html#store-instruction How we define these semantics long term is certainly up for discussion and we may want to move away from LLVM's approach, but I don't think calling this issue a bug is accurate. |
I agree with @ifreund and @Hejsil, this looks like correct Related discussion: #7115 |
The main issue seems to be that the compiler makes certain assumptions about the (bytes involved in the) representation of I've had thoughts about (what I believe to be) the same general issue in issue #6784 . |
I'm not sure if this is the same issue, but when using zig for binary file formats, sometimes undefined memory is dumped into the file. I don't have a minimal reproduction yet. Is it possible this the cause? An integer is being casted to a lower size somewhere, but the next byte(s) are being written and that results in undefined memory? |
@Jarred-Sumner yes, the 4th padding byte of e.g. a That doesn't mean that your bug is necessarily caused by this, it could be caused by writing any structure with padding or a non well defined memory layout to disk. The simplest way to solve that kind of bug would be to to zero the memory before you write to it. |
The implementation does not define anything, the language spec does, and it says "Converts an integer to another integer while keeping the same numerical value". Casting a
It has been "standardized" per the language spec, which is IMO the clear and obvious right definition. |
@jibal There is no language spec currently. There is documentation, but it is incomplete. For things not covered by the documentation, an accepted language proposal, or similar all we have is the stage1 implementation. For this implementation it was decided to rely on llvm for arbitrary bit width integer support and llvm has the semantics I quoted above. The unsound operation in example 2 I'm talking about is pointer casting To repeat what I said in my original comment, how we define these semantics long term is certainly up for discussion and we may want to move away from LLVM's approach. |
This is UB, but there should at the very least exist safety-checks for this, if it is decided to remain illegal behaviour. |
Zig Version
0.10.0-dev.1261+6f986298c
Steps to Reproduce
Expected Behavior
Code prints
Actual Behavior
Code prints
It looks like the compiler is assuming that the upper bits of a
u4
are 0, which doesn't happen when unitializing toundefined
The text was updated successfully, but these errors were encountered: