Skip to content

sema: add OOB safety check for by-length slice of array #18598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dweiller
Copy link
Contributor

@dweiller dweiller commented Jan 17, 2024

Fixes #18382.

This adds a out-of-bounds safety check for by-length slices of arrays with a comptime-known length but runtime known start.

Before merging:

Pre-existing compile error

The original fix via a compile error (#18410) was reverted due to false positive cases. There is a pre-existing compile error that has false-positive cases that are analogous to the compile error introduced in #18410 (examples below), though these cases are less likely to be triggered.

The pre-existing compile error that has false positives is this one:

test {
    var buf: [5]u8 = undefined;
    _ = buf[1..10];
}
t.zig:3:16: error: end index 10 out of bounds for array of length 5
    _ = buf[1..10];
               ^~

However this still triggers when it is guarded against by a runtime-known condition:

test {
    var buf: [5]u8 = undefined;
    var runtime_condition = false;
    _ = &runtime_condition;
    if (runtime_condition) {
        _ = buf[1..10];
    }
}
t.zig:6:20: error: end index 10 out of bounds for array of length 5
    _ = buf[1..10];
               ^~

I think it is less likely for real code that would otherwise work correctly to trigger this false-positive, since the start and end of the slice must be comptime-known (rather than just the length as with the false positive in #18410) but there may be some generic pattern working with arrays of different sizes that would trigger it.

Should I remove this compile error as well? If not should I make an issue about it or do we just wait to see if anyone encounters it in the wild?

@rohlem
Copy link
Contributor

rohlem commented Jan 17, 2024

Compile errors are better than runtime crashes.

hence I don't see any reason to remove the compile-time length check you mention.

Compare the behavior when passing a value of a different type for x: anytype.
Indexing out of bounds is just as fundamentally incorrect as trying to resolve void + 1 or void.foo.
As a user, I want to be informed of this happening in my code:

  • If it can happen i.e. it is comptime-reachable,
    then I want to be informed even if a particular run of the program doesn't hit that branch.
  • If it cannot ever happen, i.e. it is comptime-unreachable,
    then this should be communicated to the compiler: The if-condition should evaluate to comptime -known false.
    As long as this isn't the case, we're telling the compiler to generate invalid code,
    so the compile error seems like the appropriate ountcome to me.

@dweiller dweiller force-pushed the by-length-slice-safety branch from 06075e1 to 298fa7c Compare January 18, 2024 02:18
@andrewrk
Copy link
Member

Your example is not analogous at all. You may as well replace it with

const std = @import("std");

test {
    var runtime_condition = false;
    _ = &runtime_condition;
    if (runtime_condition) {
        @compileError("unreachable");
    }
}

which also, correctly, reports a compile error.

@dweiller
Copy link
Contributor Author

dweiller commented Jan 20, 2024

Your example is not analogous at all. You may as well replace it with

The thing I think is analogous is that what was happening in #18410 with the json parser was runtime checks that would prevent slicing with a (comptime-known) length greater than the array length, but as the condition was only runtime known the new compile error was hit. With #18410, you could have replaced the json parser code with a @compileError as well - it's just less obvious since it was in a generic function.

I don't think it's really correct to say either the case of #18410 or above are false-postives but rather compile errors that may be generated by code that would be correct (at runtime) and looks like a reasonable way to guard a slicing operation, but gets caught out by a particular condition being runtime known, despite sufficient information being available at comptime to avoid it if conditions are separated into comptime- and runtime-known parts properly (probably only going to happen in generic code and more likely with #18410's compile error). E.g. the json check could have been rewritten as:

 .partial_string_escaped_1 => |arr| {
-    if (i + arr.len > r.len) return error.LengthMismatch; // this is the original check, runtime known because `i` is runtime known, but in theory it could be known statically known that arr.len <= r.len after this
+    if (arr.len > r.len) return error.LengthMismatch; // comptime known condition, should prevent analysis of the following
+    if (i > r.len - arr.len) return error.LengthMismatch;
     @memcpy(r[i..][0..arr.len], arr[0..]);
     i += arr.len;
},

@rohlem
Copy link
Contributor

rohlem commented Jan 20, 2024

If I correctly understand the scenario in question (I didn't before), it can be simplified as
if ((x + 1) < 1) {@compileError("provably unreachable");}
having its body analyzed even though it is provably unreachable for any unsigned x's value.

The question is then:

  • Should a condition (x + a) < b be written as (a < b) and ((x + a) < b),
    to make it easier for the compiler to analyze it to comptime-known false
    for comptime-known a and b?
  • Or should we make comptime-resolution of arithmetic smarter
    to figure this out without a code change?

Note that ranged integers #3806 are one such way of making comptime-analysis smarter
which would solve this particular case.
(We wouldn't need the full ranged integers for this use case,
just tracking the lower bound of expressions - but ranged integers would also provide this.)

@dweiller
Copy link
Contributor Author

dweiller commented Jan 20, 2024

That's how I see it. My opinion is that we should have both compile errors

The question is then just whether we want to:

* require code to complicate a condition
  `(x + a) < b` into `(a < b) and ((x + a) < b)`, for `comptime`-known `a` and `b`,
  to make it easier for the compiler to analyze it to `comptime`-known `false`

* make `comptime`-resolution of arithmetic smarter
  to figure this out without a code change

That's how I see it, with a third option being to decide the comptime length error in #18410 is not good due to ergonomic reasons if it would come up a bunch and is a pain to avoid, though I don't think this would be the case.

@dweiller
Copy link
Contributor Author

dweiller commented Jan 21, 2024

Here is an example of a generic function that triggers the index out of bounds compile error in a similar fashion to what happened in std.json in #18410

fn sliceTo6(data: anytype, len: usize) ![]const u8 {
    if (len > 6 or len > data.len) return error.Invalid;
    return data[6 - len..6];
}

test {
    const data = [4]u8{ 1, 2, 3, 4 };
    try std.testing.expectError(error.Invalid, sliceTo6(data, 1));
}

const std = @import("std");
t.zig:3:26: error: end index 6 out of bounds for array of length 4
    return data[6 - len..6];
                         ^

Interestingly, slicing an array with a comptime-known start index greater than the length of an array is not a compile error, only the end index being greater is - this asymmetry seems odd to me; slicing the above array like data[6..6 + len] is not a compile error.

This adds a out-of-bounds safety check for by-length slices of arrays
with a comptime-known length but runtime known start/end.
@dweiller dweiller force-pushed the by-length-slice-safety branch from 298fa7c to 13ef9e6 Compare February 13, 2024 06:07
@andrewrk
Copy link
Member

#19374

@andrewrk andrewrk closed this Mar 21, 2024
@dweiller dweiller deleted the by-length-slice-safety branch March 21, 2024 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

no bounds checking when slicing s[start..][0..len] arrays and array pointers in certain cases
3 participants