Skip to content

Proposal: Conditional Access operator for fields on nullable types #8392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mjoerussell opened this issue Mar 29, 2021 · 11 comments
Closed

Proposal: Conditional Access operator for fields on nullable types #8392

mjoerussell opened this issue Mar 29, 2021 · 11 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@mjoerussell
Copy link

It seems like the options currently available to handle nullable values was more oriented towards situations like this:

const Test = struct {
    a: ?u32,
};

fn addTen(val: Test) u32 {
    const a = val.a orelse 0;
    return 10 + a;
}

However a common pattern for dealing with nullable values (in my experience at least) is this:

const Test = struct {
    a: u32,
};

fn addTen(maybe_val: ?Test) u32 {
    const inner_val = if (maybe_val) |v| v.a else 0;
    return 10 + inner_val;
}

This is why I'm proposing the conditional access operator: ?..

The basic idea is that you have a nullable struct and want to access the fields inside. If the struct itself is null, then you want to fallback on a default value. If you're dealing with a single nullable struct then it's easy enough to use the code above, but it gets annoying when dealing with 2 or more nullable structs:

fn addAllPlusTen(maybe_one: ?Test, maybe_two: ?Test, maybe_three: ?Test) u32 {
    const one = if (maybe_one) |one| one.a else 0;
    const two = if (maybe_two) |two| two.a else 0;
    const three = if (maybe_three) |three| three.a else 0;
    return 10 + one + two + three;
}

My proposal is to add a conditional access operator similar to what exists already for nullable types in Typescript. The idea would be that using a conditional access operator on a nullable type immediately returns null if the value is null, or allows accessing fields if it is not. Here's some examples:

fn example(maybe_val: ?Test) u32 {
    // If maybe_val is null then the conditional access operator will evaluate to null before trying to access the field a, meaning
    // the type of x must be ?u32.
    const x: ?u32 = maybe_val?.a;

    // The orelse operator can be used to provide a default value in the case of maybe_val being null just like normal null values
    const y: u32 = maybe_val?.a orelse 0;

    // Simpler way to provide a default value in above example
    return 10 + (maybe_val?.a orelse 0);
}

For a less contrived example, here's the situation that prompted me to create this issue - the ODBC function SQLForeignKeys uses nullable pointers to make the function operate in three different ways:

  1. Provide a null value for the foreign key table name and a non-null value for the primary key table name to get a list of all foreign keys that point to the primary key table
  2. Provide the inverse of the previous situation to get a list of all the foreign keys in the provided table name
  3. Provide both to get the foreign key column in the primary key table that points to the foreign key table

Here's an abbreviated version of this function's implementation if this proposal was accepted:

fn foreignKeys(pk_catalog_name: ?[]const u8, pk_schema_name: ?[]const u8, pk_table_name: ?[]const u8, fk_catalog_name: ?[]const u8, fk_schema_name: ?[]const u8, fk_table_name: ?[]const u8) void {
    // Library func takes a nullable pointer to each of these strings, plus the length of the strings
    const result = c.SQLForeignKeys(
        handle,
        pk_catalog_name?.ptr,
        pk_catalog_name?.len orelse 0,
        pk_schema_name?.ptr,
        pk_schema_name?.len orelse 0,
        pk_table_name?.ptr,
        pk_table_name?.len orelse 0,
        fk_catalog_name?.ptr,
        fk_catalog_name?.len orelse 0,
        fk_schema_name?.ptr,
        fk_schema_name?.len orelse 0,
        fk_table_name?.ptr,
        fk_table_name?.len orelse 0,
    );
    // finish processing ...
}

I think that having the option to write this function like this would be much nicer than having to create separate variables to hold each of the string lengths or wrapping everything in layers of if statements.

@BinaryWarlock
Copy link

BinaryWarlock commented Mar 30, 2021

👍 I have a lot of places where I've wanted this, it's incredibly tedious to have ifs and they don't work well with nested field accesses, e.g.:

foo?.bar?.baz

Good luck writing that in any sane manner.

It's worth having one for array accesses too -- foo?[i] or some such, where foo may be null.

I'm not sure how the syntax could work with ? already being a postfix orelse unreachable currently, but I find that existing case not very useful at all (the only cases that I use it, it's a lazy workaround for not having these null access semantics), I'd rather have it replaced with this one honestly. This is way more useful and safer.

Prior art: JavaScript/TypeScript, C#, etc. It's already widely implemented. (FWIW, C# also names the postfix non-null assertion ! which is more indicative of things possibly exploding.)

Zig's ? is also confusing because it's used with those different semantics in those popular languages.

@Vexu
Copy link
Member

Vexu commented Mar 30, 2021

Duplicate of #2816.

@mjoerussell
Copy link
Author

@Vexu Ah I see, I tried to search for duplicates but didn't get the name right. Do you know why that issue was originally closed? It doesn't seem like there was much disagreement on the original proposal.

@alexnask alexnask added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Mar 30, 2021
@alexnask alexnask added this to the 0.8.0 milestone Mar 30, 2021
@Vexu
Copy link
Member

Vexu commented Mar 30, 2021

Do you know why that issue was originally closed?

Likely the usual "don't add syntax sugar unless it prevents footguns/encourages better coding styles" reason since the goal is to keep Zig as simple as possible.

@mjoerussell
Copy link
Author

I totally understand the desire to keep the amount of symbols/operators in use to a minimum, but I really think that this is a case where the possible benefits for both writing and reading code outweigh the mental overhead of adding new syntax. We could expand on BinaryWarlock's example to see how it reduces the complexity of arguably simple operations:

// With this proposal
const x = foo?.bar?.baz;

// Current method
const x = if (foo) |f| 
    if (f.bar) |b|
        b.baz
    else null
else null;

// Alternate current method
var x: ?u32 = null;
if (foo) |f| {
    if (f.bar) |b| {
        x = b.baz;
    }
}

@lithdew
Copy link
Contributor

lithdew commented Mar 30, 2021

It definitely will reduce the amount of code when dealing with control flow, but I believe simplifying control flow with the syntax proposed is likely to increase the chance that developers footgun themselves (by either making it easier for developers to introduce logic errors in their code, or by making it easier to write less performant code with unnecessary branching).

Here's an example of how I do conditional branching with a disk-backed trie implementation I wrote a month back.

image

While a lot of the code is quite lengthy, scoping conditional accesses out in blocks made it easier to debug the code while I was implementing it (and lets me consider that for some of the code, I could encapsulate them out into functions when need be). Additionally, typing the whole plethora of more code needed also made me really think a whole lot harder about how I should be branching and how potentially-null variables should be dealt with when null.

With this conditional access operator, as a tradeoff for having my code coming out "aesthetically less lengthy", most likely it would take a lot more work debugging such code while I was working on it.

Perhaps additions to the syntax in Zig might be a good idea to simplify the amount of branching that goes on in a lot of Zig code, but the conditional access operator in my opinion wouldn't be the way to go.

@SpexGuy
Copy link
Contributor

SpexGuy commented Mar 30, 2021

I think this sugar does encourage safer behavior. The "easy" way to get the value of an optional is with .?, which introduces UB if the value is null. The "right" way, to check for null, is much more verbose, and requires introducing a redundant variable name: if (x) |v| v.foo else default. This is an incentive to use the UB introducing form, which is unsafe. This sugar encourages the safer form, so I think it's worth considering.

@mjoerussell
Copy link
Author

@lithdew Thanks for providing this example. I definitely agree that the bulk of code here would not benefit from a conditional access operator. Specifically, I think that the load: blocks in your example wouldn't benefit from something like this for a few reasons:

  1. It appears that node is not nullable, only node.left or node.right are, which are examples of the nullability cases that Zig already handles well.
  2. Your requirements for handling nullable values within those blocks are more complex than just "get the field if the struct is available, or null if it's not".

I think that the only place in your code where I could see this proposal being used is here:

const maybe_left_hash = if (maybe_left_node) |left_node| left_node.hash else null;
const maybe_right_hash = if (maybe_right_node) |right_node| right_node.hash else null;

// Becomes...
const maybe_left_hash = maybe_left_node?.hash;
const maybe_right_hash = maybe_right_node?.hash;

// Or even without assigning to a separate variable first
node.hash = Node.hash(key, node.value, maybe_left_node?.hash, maybe_right_node?.hash);

If I understand your concern correctly, it's that giving developers easier ways to deal with nullability will make it more likely that they'll inappropriately shoehorn it into cases that require more complex solutions, like your load: blocks above. But I'm not convinced that that's enough justification for making correct code more cumbersome to write.

@batiati
Copy link
Contributor

batiati commented Mar 30, 2021

You can't shoot yourself using the proposed ?. operator, since it never unwraps the optional type.
But we need to consider that ?. can be easily misplaced with the .? operator, which is a big footgun!

Introducing ?. should require a review in .? with possible breaking changes.

// That's ok, value still needs to be unwrapped
var value = a?.b?.c;

// Really unsafe, can reference null without checking it
var value = a.?.b.?,c;

@lithdew
Copy link
Contributor

lithdew commented Mar 30, 2021

I think this sugar does encourage safer behavior. The "easy" way to get the value of an optional is with .?, which introduces UB if the value is null. The "right" way, to check for null, is much more verbose, and requires introducing a redundant variable name: if (x) |v| v.foo else default. This is an incentive to use the UB introducing form, which is unsafe. This sugar encourages the safer form, so I think it's worth considering.

Wouldn't the fix otherwise be to discourage the use of .? rather than introduce new syntax sugar?

@lithdew Thanks for providing this example. I definitely agree that the bulk of code here would not benefit from a conditional access operator. Specifically, I think that the load: blocks in your example wouldn't benefit from something like this for a few reasons:

  1. It appears that node is not nullable, only node.left or node.right are, which are examples of the nullability cases that Zig already handles well.
  2. Your requirements for handling nullable values within those blocks are more complex than just "get the field if the struct is available, or null if it's not".

I think that the only place in your code where I could see this proposal being used is here:

const maybe_left_hash = if (maybe_left_node) |left_node| left_node.hash else null;
const maybe_right_hash = if (maybe_right_node) |right_node| right_node.hash else null;

// Becomes...
const maybe_left_hash = maybe_left_node?.hash;
const maybe_right_hash = maybe_right_node?.hash;

// Or even without assigning to a separate variable first
node.hash = Node.hash(key, node.value, maybe_left_node?.hash, maybe_right_node?.hash);

If I understand your concern correctly, it's that giving developers easier ways to deal with nullability will make it more likely that they'll inappropriately shoehorn it into cases that require more complex solutions, like your load: blocks above. But I'm not convinced that that's enough justification for making correct code more cumbersome to write.

Apologies, I should have pasted the text of the code block to make it easier to copy-paste 😅.

A specific case that I worry about is the chaining of multiple conditional access operators i.e. a?.b?.c?.d?.e orelse return error.A. Given how easy it is to type over its verbose equivalent, developers would be incentivized (at least I certainly would be) to not handle the edge cases when unwrapping variables a, b, c, or d, and lazily write just one case to return error.A instead.

A counter-argument would be that developers could use the same conditional access operator to handle each case when unwrapping a, b, c, or d, but in that case, the syntax sugar does not help at all as you could just equivalently write:

const b = a orelse return error.A;
const c = b orelse return error.B;
const d = c orelse return error.C;
const e = d.e;

Citing back to my example of a disk-backed trie, here is a case of an obvious footgun where additional handling should've been done, but was not done because it was just so much more easier to chain multiple conditional access operators and handle all null conditions with just one case:

const data = node?.next?.next?.data orelse return error.NodeNotFound;
// Was it really the case that we should have returned an error stating that the node was not found?
// Or was it an invariant that was broken, such that we should have put `unreachable` instead?
// Or was it that while we were writing code, we forgot to initialize `node?.next?.next`?
// Or, if `node?.next` is null where `node` is null, should we have a case where we initialize `node`?
// ????

Another case that I worry about is that such syntax sugar would incentivize a lot of redundant branching should multiple conditional access operators be chained i.e:

const item_1 = a?.b?.c?.d?.item_1 orelse return error.A;
const item_2 = a?.b?.c?.d?.item_2 orelse return error.B;

With the code above, nullability checks would be performed twice as much as needed. I see two counter-arguments here:

  1. A developer could figure out to unwrap a?.b?.c?.d in a separate variable.
  2. The optimizer would merge branches together.

For the first point, when writing code free-style, most likely such a refactor would be something you would only consider after you finish writing your code and clarify your codes' correctness. With Zig's current syntax, I would not have performed any redundant branching at all in the first place given how verbose it is to unwrap and handle a?.b?.c?.d. With the current syntax, I would have created a variable for its result before writing any other code that accesses variable d, as otherwise, I would need to copy 4 to 5 lines of code everywhere (rather than just 1 line of code utilizing the conditional branching operator) to access variable d while attempting to make my code correct prior to refactoring.

For the second point, I strongly doubt that optimizers are able to merge such a case of duplicate redundant branching in several many situations (especially if there are additional references to variables b, or c, etc.).

You could see how much I fixate my arguments around redundant branching and code by the way, but that is solely because one of the nice properties I like about Zig's current syntax is how much it paves the right path for me to write more performant and correct code (the more verbose some code that I need to write is, Zig is pointing out to me that I should put more care into checking it while writing it) with footguns only to be used when I am sure I can justify its use in my code (i.e. unwrapping null with .?).

@mjoerussell
Copy link
Author

@lithdew I'll admit right off the bat that I have no idea how code optimizers work, so I'll take you word on this being difficult to optimize. I also see your concerns about this encouraging sloppy code with unnecessary branches, and I don't have a direct counter-argument to that either. You're bringing up really good points. I still believe that this idea - as originally proposed - has value, but let me propose an alternate version that may be closer to addressing the issues you brought up - only allow the conditional access operator inside if statements. Let me show some examples:

// Original proposal:
const x = a?.b?.c?.d;

// Alternate:
const x = if (a?.b?.c) |c| c.d else null;

// This would not immediately solve the issue you brought up regarding duplicate branching, but I believe it would make
// it easier for a developer to spot instances that could be refactored. For example, let's restate your example:
const item_1 = a?.b?.c?.d?.item_1 orelse return error.A;
const item_2 = a?.b?.c?.d?.item_2 orelse return error.B;

// With this alternate proposal, the code would be:
const item_1 = if (a?.b?.c?.d) |d| d.item_1 else return error.A;
const item_2 = if (a?.b?.c?.d) |d| d.item_2 else return error.B;

// Here the nested access into d is still simplified, but the if statement segregates it into an 
// easy-to-spot location, hinting to the developer that it could be made into its own variable.

// This also solves an issue that, as far as I'm aware, nobody has talked about yet
// Original:
const item: u32 = a?.b?.c?.d?.item orelse return error.A;
// Alternate:
const item: u32 = if (a?.b?.c?.d) |d| d.item else return error.A;

// The benefit here, in case it's not clear, is that someone reading this code can clearly see that d.item is not nullable,
// whereas in the original it wasn't clear.

// This can also make it easier to handle cases where you want to handle intermediate nullability separately:
const item = if (a?.b) |b| blk: {
    if (b?.c?.d) |d| break :blk d.item else return error.B;
} else return error.A;

Now this solution isn't perfect either, and in my opinion it's confusing to introduce syntax that can only be used in such a specific context (could be made to work in while conditions as well). An alternate-alternate could be:

if (a and a.b and a.b.c) |c| { ... }

Although I'm not sure that this is much better.

PS I see that this issue is closed now, sorry if continuing to post this is violating a rule but since I started before it was closed I just wanted to get it out as a last word. Since it seems like this isn't something we want I'll leave it here. Thanks for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

8 participants