stage2: remove operand from return instruction #5951

Vexu · 2020-07-29T11:46:42Z

Removing the operand from the return instruction has multiple benefits:

more accurately models machine code
removes separate handling of less than pointer sized values
makes it easier to generate defers without code duplication (defer implementation: instead of duplicating the code at every exit path, make a basic block that exit paths point to #283)

andrewrk

I agree with the goal of removing the operand from the return IR instruction but the semantics need a bit of work here, hopefully the comments should make it clear how to proceed.

andrewrk · 2020-07-29T17:07:43Z

src-self-hosted/codegen.zig

-                .register => {
-                    return self.fail(inst.base.src, "TODO implement storing to MCValue.register", .{});
+                .register => |reg| {
+                    try self.setRegOrMem(inst.base.src, elem_ty, .{ .register = reg }, value);


The semantics here are not quite right- this should behave as if the register has a pointer address in it, and store to that address. This is why I think to accomplish removing the operand from the return statement we need a "Set the return value" instruction, which would not necessarily operate on a pointer.

andrewrk · 2020-07-29T17:11:40Z

src-self-hosted/astgen.zig

-        if (nodeMayNeedMemoryLocation(rhs_node)) {
-            const ret_ptr = try addZIRNoOp(mod, scope, src, .ret_ptr);
-            const operand = try expr(mod, scope, .{ .ptr = ret_ptr }, rhs_node);
-            return addZIRUnOp(mod, scope, src, .@"return", operand);
-        } else {
-            const fn_ret_ty = try addZIRNoOp(mod, scope, src, .ret_type);
-            const operand = try expr(mod, scope, .{ .ty = fn_ret_ty }, rhs_node);
-            return addZIRUnOp(mod, scope, src, .@"return", operand);
-        }
-    } else {
-        return addZIRNoOp(mod, scope, src, .returnvoid);


I see what you're going for here, and I like it. I was thinking along the same lines of removing the operand from the return instruction. However I think we still need a "Set the return value" instruction which will happen in the else branch here. For an explanation of why see the review comments in codegen.zig on the store instruction.

Consider:

fn hello(cond: bool) T { return if (cond) foo() else bar(); }

Intended semantics are: the result location of hello is passed directly as the result location of foo and bar. In master branch, this is the case. However with these changes, foo and bar will each be given a temporary result location, which is then copied to hello's result location. This copy is incompatible with pinned memory, which is something we plan to support soon with #3803 and #2765.

I think the if (nodeMayNeedMemoryLocation logic should remain.

andrewrk · 2020-07-29T17:14:40Z

src-self-hosted/zir_sema.zig

+    const result_type = body.instructions[body.instructions.len - 1];
+    const val = try mod.resolveConstValue(&block_scope.base, result_type.analyzed_inst.?);
+    return val.toType();


The "set return value" instruction would be less brittle than looking at the last instruction too, I think.

Keeping the return instruction as simple as possible has multiple benefits: * more accurately models machine code * removes separate handling of less than pointer sized values * makes it easier to generate defers without code duplication

andrewrk

I put some thought into this, and I have a vision for how this can be done cleanly, which takes into account defers. Let me know if you want to go for it, or if my comments are too confusing I would be happy to do a hand-off here.

andrewrk · 2020-07-30T17:41:30Z

src-self-hosted/astgen.zig

-        if (nodeMayNeedMemoryLocation(rhs_node)) {
-            const ret_ptr = try addZIRNoOp(mod, scope, src, .ret_ptr);
-            const operand = try expr(mod, scope, .{ .ptr = ret_ptr }, rhs_node);
-            return addZIRUnOp(mod, scope, src, .@"return", operand);
-        } else {
-            const fn_ret_ty = try addZIRNoOp(mod, scope, src, .ret_type);
-            const operand = try expr(mod, scope, .{ .ty = fn_ret_ty }, rhs_node);
-            return addZIRUnOp(mod, scope, src, .@"return", operand);
-        }
-    } else {
-        return addZIRNoOp(mod, scope, src, .returnvoid);


Consider:

fn hello(cond: bool) T { return if (cond) foo() else bar(); }

Intended semantics are: the result location of hello is passed directly as the result location of foo and bar. In master branch, this is the case. However with these changes, foo and bar will each be given a temporary result location, which is then copied to hello's result location. This copy is incompatible with pinned memory, which is something we plan to support soon with #3803 and #2765.

I think the if (nodeMayNeedMemoryLocation logic should remain.

andrewrk · 2020-07-30T18:10:29Z

src-self-hosted/Module.zig

@@ -1309,7 +1309,7 @@ fn astGenAndAnalyzeDecl(self: *Module, decl: *Decl) !bool {
                    !gen_scope.instructions.items[gen_scope.instructions.items.len - 1].tag.isNoReturn()))
                {
                    const src = tree.token_locs[body_block.rbrace].start;
-                    _ = try astgen.addZIRNoOp(self, &gen_scope.base, src, .returnvoid);
+                    _ = try astgen.addZIRNoOp(self, &gen_scope.base, src, .@"return");


This is missing a "set return value to void" instruction, which would be necessary for the "expected T, found void" compile error (when one forgets to return a value). I think we should keep the existing returnvoid instruction as well as the "return with an operand" instruction, and what this branch can do is add an additional "set the return value" instruction and "return control flow assuming the return value has already been set".

Idea being that we can have IR instructions with overlapping responsibilities; it's easy to have them call the appropriate functions during semantic analysis. This is in the effort of lowering memory usage; adding another enum tag to ir.Inst.Tag is free, so we may as well minimize the data we are allocating for these instructions.

However another observation is that once we do defers, it would make sense to have one canonical "exit the function" path from any scope. I think we should actually remove the return control flow instruction.

So, here's my proposal:

Add set_ret_val instruction, which does no control flow.

Add set_ret_void instruction. Same as set_ret_val but the operand is assumed to be the void value. (This is simply to reduce memory usage)

Remove return and returnvoid instructions.

There would be no more return control flow logic. It would be implied at the end of the main outer block of a function, and early-return control flow would be managed with breaks. I think this would work well with how to codegen defer expressions. Note, however, that the return value would either be set with ret_ptr and writing through the pointer, or set_ret_val directly.

The astgen for return syntax will need to change, to use a combination of the "break" and "set_ret_val" instructions rather than emitting the now-deleted "return" control flow instruction.

Vexu · 2020-07-30T19:34:49Z

You seem to have a clear idea of where you want to take this so I'd be happy to hand it off. I just want to split the value and the control flow so that I can try implementing defers.

pixelherodev · 2020-07-30T19:59:38Z

I intend to finish the IHEX backend (and thus the SPU Mark II PR) within the next day or two. I'd be happy to polish this off when I'm done if that helps. - Noam Preil

…

-- Email domain proudly hosted at https://migadu.com

andrewrk · 2020-07-30T23:05:21Z

OK I'm going to close this. I wrote up a little bit here: #283 (comment)
This is going to depend on finishing Register Allocation and Stack Allocation Across Conditional Branches, so I'm going to hold off on trying to implement it just yet.

andrewrk requested changes Jul 29, 2020

View reviewed changes

stage2: remove operand from return instruction

cec245f

Keeping the return instruction as simple as possible has multiple benefits: * more accurately models machine code * removes separate handling of less than pointer sized values * makes it easier to generate defers without code duplication

Vexu force-pushed the stage2-ret branch from 2438970 to cec245f Compare July 30, 2020 11:10

andrewrk requested changes Jul 30, 2020

View reviewed changes

andrewrk closed this Jul 30, 2020

Vexu deleted the stage2-ret branch June 13, 2021 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stage2: remove operand from return instruction #5951

stage2: remove operand from return instruction #5951

Vexu commented Jul 29, 2020

andrewrk left a comment

andrewrk Jul 29, 2020

andrewrk Jul 29, 2020

andrewrk Jul 30, 2020

andrewrk Jul 29, 2020

andrewrk left a comment

andrewrk Jul 30, 2020

andrewrk Jul 30, 2020 •

edited

Loading

Vexu commented Jul 30, 2020

pixelherodev commented Jul 30, 2020 via email

andrewrk commented Jul 30, 2020

stage2: remove operand from return instruction #5951

stage2: remove operand from return instruction #5951

Conversation

Vexu commented Jul 29, 2020

andrewrk left a comment

Choose a reason for hiding this comment

andrewrk Jul 29, 2020

Choose a reason for hiding this comment

andrewrk Jul 29, 2020

Choose a reason for hiding this comment

andrewrk Jul 30, 2020

Choose a reason for hiding this comment

andrewrk Jul 29, 2020

Choose a reason for hiding this comment

andrewrk left a comment

Choose a reason for hiding this comment

andrewrk Jul 30, 2020

Choose a reason for hiding this comment

andrewrk Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Vexu commented Jul 30, 2020

pixelherodev commented Jul 30, 2020 via email

andrewrk commented Jul 30, 2020

andrewrk Jul 30, 2020 •

edited

Loading