Skip to content

Prototype for tracing through instrumentation #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jasonz-dfinity
Copy link
Owner

Why

Sometimes it's difficult to understand why a certain benchmark takes X instructions, when the code being benchmarked is more complicated.

What

  • Update canbench-rs-macros to produce another query call where tracing is enabled. This is needed to avoid affecting the non-tracing behavior, while allowing tracing functionality to return different results.
  • Update canbench-bin:
    • Add a tracing flag for the binary
    • Create a new instrumented wasm (not modifying the existing one) where tracing is enabled:
      • For every tracing query methods (added by canbench-rs-macros), call an exported function __prepare_tracing
      • Add a trace_func which calls ic0.performance_counter and persist the counter along with the func_id into the buffer
      • For every wasm function, move its original functionality into a block, and call trace_func before and after
      • Collect the traces and normalize them
      • Convert traces into a flamegraph and write to the file system
  • Update canbench-rs:
    • Add an exported function to allocate the tracing buffer
    • When tracing is enabled, go down a different code path to execute the benchmarked function, where performance counter before it and after it is also recorded to the buffer.
    • Provide a function to get traces from the buffer (called by generated code from canbench-rs-macros

Considerations

  • Compared to the previous approach
    • Not depending on ic-wasm anymore, the instrumentation capability is adapted (although they work quite differently, see below)
    • Unlike ic-wasm's instrumentation which essentially duplicates the instruction accounting for every system API and wasm instruction type, it calls ic0.performance_counter instead, so there is no need to keep the instruction accounting up-to-date
    • The tracing is now enabled right before the benchmarked region, and disabled right after that. This is significantly better when there is a lot of work done before bench_fn is called, since those traces won't be emitted.
    • The tracing overhead is made as predictable as possible, and the traces are normalized to account for the overhead.
    • There is no update method added, so there is a much better guarantee that running multiple benchmarks with and without tracing should not interfere with each other
    • Tracing buffer is not in stable memory and is pre-allocated in heap, so canisters using stable memory can use it without modifying their code

Caveats

  • The traces are returned through the canister reply of the tracing methods, and they are subject to the message size limit. there is not yet a way to override the message size limit similar to the instruction limit, but it is theoretically possible.
  • Pre-allocating tracing buffer could have an effect on the heap memory allocation, which causes the benchmarked function to behave slightly differently than otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant