Cache locality and speed #7453

kripken · 2025-04-04T22:48:29Z

Talking with @tlively , we realized that --roundtrip will restructure code back into a cache-friendly form, since it serializes it and then reads it, and when we read it, we allocate adjacent instructions contiguously in an arena. Imagine we begin with unoptimized code, then optimizations quickly add pointers to arbitrary places in memory, but doing a --roundtrip can "fix" that, and might be worth it if we run more optimizations afterwards.

To measure this, I took a large unoptimized Kotlin testcase I have. -O3 takes 50 seconds, a second -O3 after it takes 25 seconds (it makes sense it would be faster, since after the first cycle, there is a lot less code). Adding a --roundtrip between the two adds 2 seconds for the roundtrip itself, but makes the total time 2 seconds faster. So ignoring the roundtrip's time, we gain 4 seconds on the second -O3, which is something like 15% faster.

Perhaps we should try to reuse instructions when rewriting more - we do that in OptimizeInstructions in some places, but it does make the code more complex. Perhaps helper utilities can do that in nice ways though.

In theory we could consider doing some reordering ("defrag") that is more efficient that roundtrip, automatically after enough passes have been run.

cc #4165

The text was updated successfully, but these errors were encountered:

tlively · 2025-04-05T01:06:31Z

Neat results, thanks for looking into it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache locality and speed #7453

Cache locality and speed #7453

kripken commented Apr 4, 2025 •

edited

Loading

tlively commented Apr 5, 2025

Cache locality and speed #7453

Cache locality and speed #7453

Comments

kripken commented Apr 4, 2025 • edited Loading

tlively commented Apr 5, 2025

kripken commented Apr 4, 2025 •

edited

Loading