-
Notifications
You must be signed in to change notification settings - Fork 282
wasmparser
: 15-30% performance regressions from v228 -> v229
#2180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Would you be able to help profile and track down where this is coming from? Perhaps via bisection? |
Yes I can do that. |
@alexcrichton The 30% parse and validation regression first appears here
The commit right before it does not have the regression:
I have also ran benchmarks on the newly released cc @keithw |
I can replicate this in-repo with |
I did some measurements on parsing (without validating) a single huge module on an unloaded AMD Ryzen 7 PRO 4750U, comparing two release builds built with rustc 1.86.0 (05f9846f8 2025-03-31). For the "before", I tested the commit just before that PR (90c156f), backporting the "full parse" logic from https://github.com/bytecodealliance/wasm-tools/blob/main/src/lib.rs#L299 (with the exception of the ops.finish()?; line since the "before" parser doesn't have a finish function to check anything at the end of a function). For the "after", I used that PR (0354dde). In both cases I tested I was surprised to find that in this "single huge module" test, I didn't see a regression:
For parsing+validation, again with this single huge module, I see a roughly 3% slowdown on running On the other hand, cargo bench on the spec and local testsuite (which has thousands of mostly tiny modules) definitely documents a regression. So I'm wondering... are you seeing a big slowdown on parsing or parse+validating individual (large) modules, and if so, are you able to share an example module? Or should we be looking at a slowdown related to parser creation/startup time on lots of small modules, and that's getting washed out in the noise on my "single huge module" test? |
@keithw I am not sure I can follow the reasoning behind all your steps above but let me answer you the questions. And sorry for the delay in doing so. First of all, I am on an Apple M2 Pro and have no access to other machines for the purpose of benchmarking and from what I remember I ran the benchmarks on Rust 1.85.
I saw 15% regressions across the board for large modules such as You can view Wasmi benchmarks here: These are parts of the Wasm files I use to benchmark: Why did you remove the call to |
Hmm, I'm still having trouble replicating this on x86-64 with spidermonkey.wasm. I wonder if this might be an aarch64 vs. x86-64 issue or something else. When you say For validation, here's what I get running
And with the same command 1,500 times on commit 90c156f (just before that PR):
So I'm seeing a slowdown in single-threaded validation performance of about 3.9%. (The If you're also talking about validation, can I ask what kinds of absolute numbers you're able to measure for the same pair of commits on your machine?
The Parser only gained a |
The largest regressions (~30%) I saw when Wasmi used
Did you manage to replicate this on I start to wonder if the slowdown is connected to the way Though this would not explain why |
I'm like @keithw where I've reproduced a 3-5% regression in this repository's benchmarks (e.g. Using There appears to be something weird going on with wasmi's benchmark build and I don't know what. I removed all modification of my new My hunch at this time is that this is right on the edge of something like CPU cache lines, registers being spilled, some inlining threshold, or something like that. I don't know how to narrow it down further, but IMO this isn't too too actionable. It's possible I think to use a I'm a bit of a loss of how best to address this myself. |
Hmmm, this is strange! On my machine I also see those 30% regressions for Though, not all |
@Robbepop wanted to bump my question -- are you able to report the absolute cycle counts or times (for the same |
Hey @keithw , sorry I didn't get that question. I have tried to find programs on MacOS that can provide cycle counters but found none. From what I found is that even Do you have ideas how to test this more accurately on MacOS? Not sure how I can best share the findings of the Maybe @alexcrichton is right with his suspicion that indeed it is not about cycles but about "CPU cache lines, registers being spilled, some inlining threshold" or inferior branch prediction etc. |
@Robbepop oh you're right about the in-repository benchmark regressions, at the time I was only looking at the validation of spidermonkey and ignoring all others. I can reproduce, for example, a 30% regression in parsing all tests. I believe this is due to the stack of blocks being managed where they weren't before. If I replace that vector with a simple Can you confirm/deny whether the benchmarks in wasmi are validating? Locally as is somewhat expected the biggest regressions are in "just parse the bytes" tests and the smaller regressions are in "ok also validate" tests. That may help explain the discrepancy in wasmi as well. Overall though wasmi is less prone to inlining issues due to enabling LTO, so only the structural @keithw the benchmarks that @Robbepop is mentioning are located in the wasmi repository and are defined with criterion which measures wall time across iterations. I'd be wary of measuring regressions with I'm certainly not opposed to adding more |
@alexcrichton let me go through the Wasmi benchmarks I pasted above and inform what they do:
In all above benchmarks the non-function body parts of a Wasm module are entirely parsed and validated via So what we see is the following:
I suppose if one applies all those optimizations to
I think this is an incorrect thinking as inline annotations also act as hints to the compiler. LTO and inlining hints in general are useful in combination if applied correctly. |
Hm ok I'm a bit surprised then, in pure validation cases in this repository nothing comes close to the 30% regression you're seeing in
Yes and no. This was added to validate the distinction between The non-spec-compliance aspect comes about insofar as wasmparser will parse wasms that are malformed, or technically syntactically invalid. Such invalid wasms will still be caught by the validator, however. IMO the answer to your question, though, is "no". A simple |
This commit relaxes a check added in bytecodealliance#2134 which maintains a stack of frame kinds in the operators reader, in addition to the validator. The goal of bytecodealliance#2134 was to ensure that spec-wise-syntactically-invalid-modules are caught in the parser without the need of the validator, but investigation in bytecodealliance#2180 has shown that this is a source of at least some of a performance regression. The change here is to relax the check to still be able to pass spec tests while making such infrastructure cheaper. The reader now maintains just a small `depth: u32` counter instead of a stack of kinds. This means that the reader can still catch invalid modules such as instructions-after-`end`, but the validator is required to handle situations such as `else` outside of an `if` block. This required some adjustments to tests as well as some workarounds for the upstream spec tests that assert legacy exception-handling instructions are malformed, not invalid.
Uh oh!
There was an error while loading. Please reload this page.
I just finished updating Wasmi to the new
wasm-tools
v229 coming from v228.wasmparser
parsing and validation made up a larger chunk of the work. So that's probably the real regression inwasmparser
.Wasmi PR: wasmi-labs/wasmi#1501
Benchmarking before and after revealed the following performance regressions:

The text was updated successfully, but these errors were encountered: