You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
andrewrk opened this issue
May 3, 2020
· 4 comments
Labels
acceptedThis proposal is planned.frontendTokenization, parsing, AstGen, Sema, and Liveness.proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.
The main idea here is that (1) each global declaration in zig directly maps to a symbol in the output binary and (2) within each symbol, the code is Position Independent, not only with respect to itself, but also with respect to the other symbols it references. Each reference to a global declaration is indirect, through the table of offsets. This accomplishes 2 things:
A symbol can be relocated within the output binary, e.g. imagine that "Hello, world!" was changed to "lllll, world!". And further imagine that the symbol offsets table needed to grow, so that the msg symbol needed to move to later in the file, below _start. The only thing that would need to change is (1) moving the msg data to the new location, and (2) updating the table of offsets. Even if msg was referenced 100 times, those 100 references would be unchanged, despite the fact that it moved around in the address space.
Really powerful hot code swapping. The process could be paused with e.g. ptrace, and then updated symbols would, rather than being updated in place, be appended only. The table of offsets would be updated to point to the new symbols. The process would then be resumed. Function calls which were in-progress (all the way up the stack) would complete using the old function code, however new function calls would call the new function. Any instruction addresses captured for debug info purposes would be valid throughout any number of hot code swaps. There are a lot of different ways this could go, but this demonstrates what a table of offsets accomplishes.
I'm pretty sure I just reinvented the Global Offset Table so it might make sense to read about how that works and just use that.
The text was updated successfully, but these errors were encountered:
andrewrk
added
proposal
This issue suggests modifications. If it also has the "accepted" label then it is planned.
frontend
Tokenization, parsing, AstGen, Sema, and Liveness.
labels
May 3, 2020
This is working out nicely. Functions will use .got.plt and use a "trampoline" style, so that the codegen can be a jmp to a hard coded addr in the offset table, which will have a call to a hard coded addr. So the CPU sees only direct jumps to hard coded addresses, but it's still "indirect" in the sense that there is only 1 place for codegen to edit.
Did you measure the performance impact? There's gotta be some; there's now an extra instruction in every function call, and the code size is increased by all the trampolines.
Reflecting @tbodt 's concern -- using the GOT to get the entry point to branch to using an indirect is more expensive on the first call, since it resolves in the backend rather than the front-end, but uses only one BTB entry. So the trampoline method works the CPU front end a bit harder in steady state. This would be most impactful in code that has many very short functions/methods where you would pay the two- vs. one-branch overhead more often and stress BTB capacity with the extra entry for each trampoline. As for the cache impact of the trampolines, the trampoline method forces them to live on the instruction side, whereas with indirects the GOT offsets can live on the data side. Obviously these impacts will vary greatly across microarchitectures and workloads.
This strategy is used for scopes that are configured to be compiled in Debug mode. I expect in real world use cases, applications will compile some, perhaps most, of their dependency packages in one of the release modes, as well as the hot code paths of the application. With per-scope granularity of optimization mode, I expect the entire surface area of hot paths to be fully optimized, so these concerns should not apply there.
acceptedThis proposal is planned.frontendTokenization, parsing, AstGen, Sema, and Liveness.proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.
Related issues: #1535 and #68
Here's an example of how to lay out code in the binary:
example.zig
example.asm
The main idea here is that (1) each global declaration in zig directly maps to a symbol in the output binary and (2) within each symbol, the code is Position Independent, not only with respect to itself, but also with respect to the other symbols it references. Each reference to a global declaration is indirect, through the table of offsets. This accomplishes 2 things:
A symbol can be relocated within the output binary, e.g. imagine that "Hello, world!" was changed to "lllll, world!". And further imagine that the symbol offsets table needed to grow, so that the
msg
symbol needed to move to later in the file, below_start
. The only thing that would need to change is (1) moving themsg
data to the new location, and (2) updating the table of offsets. Even ifmsg
was referenced 100 times, those 100 references would be unchanged, despite the fact that it moved around in the address space.Really powerful hot code swapping. The process could be paused with e.g.
ptrace
, and then updated symbols would, rather than being updated in place, be appended only. The table of offsets would be updated to point to the new symbols. The process would then be resumed. Function calls which were in-progress (all the way up the stack) would complete using the old function code, however new function calls would call the new function. Any instruction addresses captured for debug info purposes would be valid throughout any number of hot code swaps. There are a lot of different ways this could go, but this demonstrates what a table of offsets accomplishes.I'm pretty sure I just reinvented the Global Offset Table so it might make sense to read about how that works and just use that.
The text was updated successfully, but these errors were encountered: