-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8356708: C2: loop strip mining expansion doesn't take sunk stores into account #25717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back roland! A progress list of the required criteria for merging this PR into |
@rwestrel This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 94 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two. Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general. |
Test results (tier1-5 in Oracle's internal test system) look good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good otherwise!
Another question: could PhaseIdealLoop::try_move_store_before_loop
cause similar issues on strip-mined loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good, for completeness, to add a "Couple stores sunk in outer loop, store in inner loop" test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new commit.
@robcasloz That would have also been my question. @rwestrel why did we omit those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwestrel Thanks for looking into this! The fix seems reasonable given that we don't have phi's at the outer loop. But why don't we have those phis in the first place?
src/hotspot/share/opto/loopnode.cpp
Outdated
// Sunk stores are reachable from the memory state of the outer loop safepoint | ||
Node* safepoint = outer_safepoint(); | ||
Node* safepoint_mem = safepoint->in(TypeFunc::Memory); | ||
if (safepoint_mem->is_MergeMem()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have flipped the condition, and made an early exit condition from this. That way, the code is indented one level less. Just a suggestion, feel free to ignore :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in new commit.
I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit. |
Co-authored-by: Roberto Castañeda Lozano <[email protected]>
Co-authored-by: Roberto Castañeda Lozano <[email protected]>
Co-authored-by: Emanuel Peter <[email protected]>
@rwestrel Dropping the In some way, not having the Adding the It's a difficult trade-off. |
Thanks for the reviews @robcasloz @eme64 |
That one moves stores out of the inner and outer loop so, no, I don't see a similar issue there. |
Thanks Roland, I'll re-run testing and come back with results on Monday. |
Thanks for the background, Roland! I think it would be worth exploring this, but I agree that there is a risk of silently affecting other loop optimizations. Luckily, the IR test framework gives us now a means to improve our confidence that changes in this area do not affect expected optimizations. Unfortunately, our current IR test coverage of loop optimizations is incomplete, so a pre-condition to exploring full SSA for strip-mined loops (and something worth doing in any case IMO) would be adding more IR tests checking that at least basic optimizations like peeling, unswitching, unrolling, range check elimination, etc. happen as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing went faster than I thought and did not reveal any issue, looks good, thank you for fixing this!
test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java
Outdated
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/loopstripmining/TestStoresSunkInOuterStripMinedLoop.java
Outdated
Show resolved
Hide resolved
…terStripMinedLoop.java Co-authored-by: Roberto Castañeda Lozano <[email protected]>
…terStripMinedLoop.java Co-authored-by: Roberto Castañeda Lozano <[email protected]>
Thanks for the review @robcasloz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwestrel Thanks for the work here :)
As mentioned previously, it feels like we have allowed a violation of C2 IR assumptions, namely that there is always a Phi if we mutate the memory state. And now we need to clean up things after that violation. Not great, but I understand why we got here: for simplicity of the outer strip mined loop, and not affecting other optimizations.
Like I ask in one of my code comments below:
Is there any place where we describe why we do not have Phis at the outer loop, and why we think that should be ok? It would be good to have those assumptions documented. And then you can refer to the method OuterStripMinedLoopNode::handle_sunk_stores_at_expansion
, where we have to clean things up, and from there also back to the main description.
I see you have some minimal comments at PhaseIdealLoop::create_outer_strip_mined_loop
:
// to the loop head. The inner strip mined loop is left as it is. Only
// once loop optimizations are over, do we adjust the inner loop exit
// condition to limit its number of iterations, set the outer loop
// exit condition and add Phis to the outer loop head.
I think we should add some more info there, and link to and from OuterStripMinedLoopNode::handle_sunk_stores_at_expansion
.
Additionally / alternatively, you could also comment directly at the OuterStripMinedLoopNode
class.
I just would like to prevent the situation where you are the only person who is able to understand how the outer strip mined loop works ;)
src/hotspot/share/opto/loopnode.cpp
Outdated
@@ -2988,11 +2989,75 @@ void OuterStripMinedLoopNode::fix_sunk_stores(CountedLoopEndNode* inner_cle, Loo | |||
} | |||
} | |||
|
|||
// Sunk stores should be referenced from an outer loop memory Phi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you really need to give some longer explanation here why we need to do what you do here.
Also: is there anywhere a description why we do not have the phi already by default for outer loops? Because I think we should really describe that somewhere, and state our assumptions. And then you could also refer to that description from here, and from there to here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in new commit.
src/hotspot/share/opto/loopnode.cpp
Outdated
void OuterStripMinedLoopNode::handle_sunk_stores_at_expansion(PhaseIterGVN* igvn) { | ||
Node* cle_exit_proj = inner_loop_exit(); | ||
|
||
// Sunk stores are pinned on the loop exit projection of the inner loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a description why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New commit should address that one as well.
} | ||
#endif | ||
|
||
// Sunk stores are reachable from the memory state of the outer loop safepoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that the control of the safepoint is the cle_exit_proj
? Could we add an assert for that? So we are just looking for all memory between those two control nodes? Or is it more complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. I added a call to verify_strip_mined()
that checks the shape of the outer loop including the control flow nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks :)
src/hotspot/share/opto/loopnode.cpp
Outdated
@@ -2988,11 +2989,75 @@ void OuterStripMinedLoopNode::fix_sunk_stores(CountedLoopEndNode* inner_cle, Loo | |||
} | |||
} | |||
|
|||
// Sunk stores should be referenced from an outer loop memory Phi | |||
void OuterStripMinedLoopNode::handle_sunk_stores_at_expansion(PhaseIterGVN* igvn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the word "expansion" refer to? Could you also mention that in your code comment above, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed that one.
Co-authored-by: Emanuel Peter <[email protected]>
Co-authored-by: Emanuel Peter <[email protected]>
Co-authored-by: Emanuel Peter <[email protected]>
Done in new commit. Can you have another look @eme64 ? |
Re-running Oracle-internal testing... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates an improved documentation!
I have a few more minor suggestions :)
// As loop optimizations transform the inner loop, the outer strip mined loop stays mostly unchanged. The only exception | ||
// is nodes referenced from the SafePoint and sunk from the inner loop: they end up in the outer strip mined loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to reference handle_sunk_stores_when_finishing_construction
?
@@ -3111,6 +3213,8 @@ void OuterStripMinedLoopNode::adjust_strip_mined_loop(PhaseIterGVN* igvn) { | |||
} | |||
} | |||
|
|||
handle_sunk_stores_when_finishing_construction(igvn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above, where you insert the phis, you may want to say something about the case of Sunk Stores as well.
} | ||
#endif | ||
|
||
// Sunk stores are reachable from the memory state of the outer loop safepoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks :)
Co-authored-by: Emanuel Peter <[email protected]>
Co-authored-by: Emanuel Peter <[email protected]>
@TobiHartmann Had 2 questions:
|
Testing (commit fa550f2 applied on top of jdk-26+2) passed. |
New commit should cover your latest comments. Can you please have another look @eme64 ? |
Either way is fine with me. It does feel like a nasty issue that wouldn't be easy to diagnose if someone runs into it in the wild.
I don't think it is. It's an issue that exists since loop strip mining exists AFAICT. I haven't tried how far back the test reproduces it though. |
In my opinion, the fix is quite local and contained, so the risk of causing regressions does not seem too high. There is also still quite some time left to observe and react to issues before the RC phase. I would vote for JDK 25. |
I'm ok with that too. The fix looks reasonable, and not too risky. It is probably better to have this fix in than keeping the bug in the wild, as you say.
It would be good if you could find out what versions are really affected, and set the affected numbers accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I conservatively added every version since loop strip mining was integrated given they are affected even if this particular test case doesn't fail. |
Thanks for the reviews @robcasloz @eme64 |
/integrate |
Going to push as commit c11f36e.
Your commit was automatically rebased without conflicts. |
test1()
has a counted loop with aStore
tofield
. ThatStore
is sunk out of loop. When the
OuterStripMinedLoop
is expanded, onlyPhi
s that exist at the inner loop are added to the outerloop. There's no
Phi
for the slice of the sunkStore
(becausethere's no
Store
left in the inner loop) so noPhi
is added forthat slice to the outer loop. As a result, there's a missing anti
dependency for
Load
offield
that's before the loop and it can bescheduled inside the outer strip mined loop which is incorrect.
test2()
is the same astest1()
but with a chain of 2Store
s.test3()
is another variant where aStore
is left in the inner loopafter one is sunk out of it so the inner loop still has a
Phi
. As aresult, the outer loop also gets a
Phi
but it's incorrectly wired asthe sunk
Store
should be the input along the backedge but isnot. That one doesn't cause any failure AFAICT.
The fix I propose is some extra logic at expansion of the
OuterStripMinedLoop
to handle these corner cases.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25717/head:pull/25717
$ git checkout pull/25717
Update a local copy of the PR:
$ git checkout pull/25717
$ git pull https://git.openjdk.org/jdk.git pull/25717/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25717
View PR using the GUI difftool:
$ git pr show -t 25717
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25717.diff
Using Webrev
Link to Webrev Comment