Skip to content

Proposal: disallow direct assignment to x.? #17508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xdBronch opened this issue Oct 13, 2023 · 5 comments
Open

Proposal: disallow direct assignment to x.? #17508

xdBronch opened this issue Oct 13, 2023 · 5 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@xdBronch
Copy link
Contributor

currently zig allows the following code to compile

var x: ?u32 = null;
x.? = 10

not only is this dangerous as it causes panics in safe modes and UB in unsafe modes, but it's just plain wrong, a user should simply do x = 10 instead. for newcomers to zig this code may be a logical conclusion to come to, it seemingly synergizes well with the syntax of pointers

var y: *u32 = ...;
_ = y.*; // access the value being pointed to
y.* = ...; // reassign the value being pointed to. this is perfectly fine
...
var x: ?u32 = 10;
_ = x.?; // access the underlying value
x.? = ...; // reassign the underlying value?

@InKryption pointed out to me that x.? as an lvalue semantically is useful and should still be allowed, e.g. &x.?. if we changed the semantic meaning of x.? this would cause the unwrapped value to be a temporary and therefore yield a constant pointer.

this pattern is also extremely uncommon. using sourcegraph i counted only 11 occurrences in total, 3 of which were simply a duplicated behavior test from within zig itself across zig, zig-bootstrap, and zig-spec. see:

comptime var maybe_pos_arg: ?comptime_int = null;
inline for ("ab") |x| {
_ = x;
maybe_pos_arg = 0;
if (maybe_pos_arg.? != 0) {
@compileError("bad");
}
maybe_pos_arg.? = 10;
any other case should be trivially fixed
i see no reason for this syntax to be allowed as it silently causes incorrect code and is almost never used

@Vexu Vexu added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Oct 13, 2023
@Vexu Vexu added this to the 0.12.0 milestone Oct 13, 2023
@rohlem
Copy link
Contributor

rohlem commented Oct 13, 2023

it's just plain wrong

Just wanted to note that there is a possible use case.
When reading from it, x.? asserts that x isn't null. Similarly I would assume x.? = n to have the same assertion, i.e. be a terser equivalent of if (x == null) unreachable; x = n;.

The silent failure / UB is certainly wrong, all illegal behavior should be checked in safe build modes, see #2301 .
I agree that it's uncommon and maybe "not worth having", however that in itself doesn't make the syntactic construct "plain wrong".


On a related note, writing to &x.? when x == null can't have consistent behavior for all optional types:

  • For packed null-representations, as in pointer types, writing a non-null value changes the optional to be non-null.
  • For non-packed ones (an extra is_null bit that isn't affected by writing to the payload pointer), the optional stays null.

IMO this is ultimately the same edge case, and should be declared illegal behavior and safety-checked as well, to the effect that writing to &x.? if x == null makes the null-ness of x dirty/undefined, resulting in a panic if it is read from before x was overwritten again.

@xdBronch
Copy link
Contributor Author

doing this is checked in safe modes as I mentioned at the top of the post. by "silently causes incorrect code" I meant that it seems innocent if you're unaware of the effect it has which, imo, makes your code incorrect.

@andrewrk andrewrk modified the milestones: 0.12.0, 0.15.0 Oct 14, 2023
@RiscInside
Copy link

RiscInside commented Oct 14, 2023

I would like to point out that code generated with unreachable is a bit different. In principle,

x.? = 1;

should be equivalent to

if (x == null) unreachable;
x.? = 1;

However, LLVM isn't smart enough to pick up on this (and I doubt Zig self-hosted backend would be smart enough for this either). For instance, this Zig code

fn f(target: *?u64) void {
  if (target.* == null) unreachable;
  target.* = 1;
}

fn f2(target: *?u64) void {
  target.*.? = 1
}

compiles to this assembly on amd64

example.f:
        vmovups xmm0, xmmword ptr [rip + .L__unnamed_1]
        vmovups xmmword ptr [rdi], xmm0
        ret

example.f2:
        mov     qword ptr [rdi], 1
        ret

See https://godbolt.org/z/TEcosqjsx

.? assignment lets Zig compiler know that value has already been tagged properly.

@xdBronch
Copy link
Contributor Author

that's a fair observation but seeing how very rare this use seems to be, I'm guessing optimization is not the reason for it. as for the optimizers being smart enough, the self hosted might not be able to do that yet but it can be specifically curated to zig and it's patterns, this could be one such case.

@RiscInside
Copy link

RiscInside commented Oct 15, 2023

One way to still have good codegen here is to take address with & (so that its no longer a direct assignment)

fn f3(target: *?u64) void {
  (&target.*.?).* = 1;
}

This produces same exact code as direct assignment with LLVM and is convoluted enough for no beginner to arrive at by mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

5 participants