Skip to content

audit analysis of undefined values; make it clear when undefined is allowed or not #1947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andrewrk opened this issue Feb 11, 2019 · 2 comments
Labels
docs enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Feb 11, 2019

In IR analysis, right now Zig is inconsistent about the semantics of an operation with an undefined value. This issue is to make the rules clear how it is supposed to work. Note that undefined values are different from undefined behavior.

  • In memory, an undefined value of type T takes up the same store size as a normal value of type T, and exists as any bit pattern within that store size. Thus by looking only within the store size of an undefined value it may be impossible to tell that it is an undefined value.
  • Undefined values semantically represent an extra state which is not possible to represent using any of the valid bit patterns of the underlying type. However, aside from the store size, the representation of an undefined value in memory is undefined; it can be any bit pattern. As an example, the value u8(undefined), in memory, could be any combination of bits that fits in @sizeOf(u8), which is 1. The value bool(undefined), in memory, could be any combination of bits that fits in @sizeOf(bool), which is also 1. So even though the only valid bit patterns of the type bool are 0b00000000 and 0b00000001, when the value is undefined, the byte which represents the storage of the u1 value could be anything, including 0b00000010, 0b10101010, or 0b11111111. Therefore, because undefined values semantically represent an extra state, it is an incorrect assumption that an undefined value with type T has a value which is in the set of valid values for type T.
  • Expressions which have no side effects and no possible undefined behavior, and one or more of the operands has an undefined value which is read, the expression result is an undefined value. For example, the +% operator. Note that for slicing operator, if the start is 0, the pointer value is not read, which makes this expression defined: (([*]u8)(undefined))[0..0]. Another example is @ptrCast(*i32, (*u32)(undefined)). Although 0x0 is not a valid bit pattern for the type *u32, 0x0 is a possible bit pattern within the store size of *u32, and so this expression is capable of producing an invalid bit pattern for the result type. However @ptrCast is defined to have no possible undefined behavior because it is a no-op on the bit pattern.
  • Branching on an undefined value is undefined behavior. This can be caught at comptime, and caught at runtime if debug safety feature: runtime undefined value detection #211 is solved. For example, the condition of an if expression.
  • Expressions which have possible undefined behavior, if one or more of the operands is an undefined value and there are any combinations of bit patterns within the store sizes of the undefined values that would cause undefined behavior then this expression causes undefined behavior. For example, @intCast(u8, u16(undefined)). Another example: the + operator. However if one of the operands of + is comptime-known to be 0, and the other is an undefined value the result is an undefined value because there exists no bit pattern added to 0 that causes overflow.
    Every IR instruction analysis code should be audited and tests added to enforce this behavior, especially for comptime code.

Also these rules should be made clear in the language reference.

@vi
Copy link

vi commented Sep 19, 2019

Documentation may need updating to closely resemble what's actually happens in LLVM.

In particular,

the value could be anything, even something that is nonsense according to the type.

phrase is suspicious and may mislead users.

Suggested example:

const warn = @import("std").debug.warn;

fn get_zero() u32 {
    var x : u32 = undefined;
    return x & ~x;
}

pub fn main() void {
    if (get_zero() == 0) {
        warn("Y\n");
    } else {
        warn("N\n");
    }
}

Y in safe mode, N in optimized mode.

@raulgrell
Copy link
Contributor

Related: #8056

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Projects
None yet
Development

No branches or pull requests

4 participants