-
Notifications
You must be signed in to change notification settings - Fork 0
fix: sigsev in valgrind due to incorrect optimization #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: sigsev in valgrind due to incorrect optimization #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@not-matthias is this the only way? Not inlining the instrumentation calls is an issue for accuracy of the measurement as we discussed already
src/helpers/valgrind/callgrind.zig
Outdated
@@ -48,6 +48,6 @@ pub inline fn startInstrumentation() void { | |||
/// Use this to bypass Callgrind aggregation for uninteresting code parts. | |||
/// To start Callgrind in this mode to ignore the setup phase, use | |||
/// the option "--instr-atstart=no". | |||
pub inline fn stopInstrumentation() void { | |||
pub export fn callgrind_stop_instrumentation() void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not being able to inline this is kindof an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I chose to export this was the following:
- The export will add 1 additional
call
to this function, which is not that much considering that we have a few instructions to setup/clean the stack before/afterwards. - We can call this exported function directly from within the benchmark. We can't really control how the compiler lowers the function to assembly.
The callgrind_start_benchmark
function doesn't cause any overhead since we directly return to the benchmark:
stop_benchmark
when using export
:
stop_benchmark
when using inline
:
As seen on the two examples above, we will execute at least 10+ other instructions to setup the stack, which will cause overhead. The only real way to ensure that we will always call it at the start would be using inline assembly which has an additional maintenance burden.
So my proposed solution is as follows:
- Keep the exports for starting and stopping valgrind, and call them in the root.
- If we are using valgrind for interpreted languages such as Python, the overhead of 10 instructions will be negligible. If we want to use valgrind for Rust or C++, we can call the
callgrind_start/stop_benchmark
functions directly.
a671f32
to
0fb92ee
Compare
Looks like it fails to compile in CI with export, even though it works locally. Very weird 🤔 I'll invest a bit more time to try to figure out what's going on and to see if we can fix it, if that doesn't work I think it's best to just revert back to wrapping the valgrind header. There's little benefit in transpiling C -> Zig -> C. EDIT: Turns out the issue is wrong inline assembly in the Zig std library. Using the inline assembly as defined in the valgrind header works. .x86_64 => asm volatile (
\\ rolq $3, %%rdi ; rolq $13, %%rdi
\\ rolq $61, %%rdi ; rolq $51, %%rdi
\\ xchgq %%rbx, %%rbx
: [_] "={rdx}" (-> usize),
: [_] "a" (args),
[_] "0" (default),
: "cc", "memory"
), However, this doesn't work in Zig for some reason:
I think the best way to deal with this is to just use the header from valgrind, then we don't have to deal with the inline assembly and differences between clang/gcc/zig. Inline assembly is currently also still unstable with major work going on (ziglang/zig#5241) so it's just safer to not yet rely on it. |
To summarize the final results:
|
9d8502e
to
709cb61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks for the detailed analysis
709cb61
to
acc213d
Compare
See linear issue for details. The main issue was having
zeroStats
andstartInstrumentation
in one function (with both being inlined). This seemed to confuse the compiler. Exporting one of the functions (startInstrumentation in this case) fixes the issue.Also exported stopInstrumentation since we might want to call it manually later on to reduce the overhead (if there is one).
I'll add a CI step tomorrow:
just test-c
)