You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Talking with @tlively , we realized that --roundtrip will restructure code back into a cache-friendly form, since it serializes it and then reads it, and when we read it, we allocate adjacent instructions contiguously in an arena. Imagine we begin with unoptimized code, then optimizations quickly add pointers to arbitrary places in memory, but doing a --roundtrip can "fix" that, and might be worth it if we run more optimizations afterwards.
To measure this, I took a large unoptimized Kotlin testcase I have. -O3 takes 50 seconds, a second -O3 after it takes 25 seconds (it makes sense it would be faster, since after the first cycle, there is a lot less code). Adding a --roundtrip between the two adds 2 seconds for the roundtrip itself, but makes the total time 2 seconds faster. So ignoring the roundtrip's time, we gain 4 seconds on the second -O3, which is something like 15% faster.
Perhaps we should try to reuse instructions when rewriting more - we do that in OptimizeInstructions in some places, but it does make the code more complex. Perhaps helper utilities can do that in nice ways though.
In theory we could consider doing some reordering ("defrag") that is more efficient that roundtrip, automatically after enough passes have been run.
Talking with @tlively , we realized that
--roundtrip
will restructure code back into a cache-friendly form, since it serializes it and then reads it, and when we read it, we allocate adjacent instructions contiguously in an arena. Imagine we begin with unoptimized code, then optimizations quickly add pointers to arbitrary places in memory, but doing a--roundtrip
can "fix" that, and might be worth it if we run more optimizations afterwards.To measure this, I took a large unoptimized Kotlin testcase I have.
-O3
takes 50 seconds, a second-O3
after it takes 25 seconds (it makes sense it would be faster, since after the first cycle, there is a lot less code). Adding a--roundtrip
between the two adds 2 seconds for the roundtrip itself, but makes the total time 2 seconds faster. So ignoring the roundtrip's time, we gain 4 seconds on the second-O3
, which is something like 15% faster.Perhaps we should try to reuse instructions when rewriting more - we do that in OptimizeInstructions in some places, but it does make the code more complex. Perhaps helper utilities can do that in nice ways though.
In theory we could consider doing some reordering ("defrag") that is more efficient that roundtrip, automatically after enough passes have been run.
cc #4165
The text was updated successfully, but these errors were encountered: