Skip to content

8356708: C2: loop strip mining expansion doesn't take sunk stores into account #25717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

rwestrel
Copy link
Contributor

@rwestrel rwestrel commented Jun 10, 2025

test1() has a counted loop with a Store to field. That Store
is sunk out of loop. When the OuterStripMinedLoop is expanded, only
Phis that exist at the inner loop are added to the outer
loop. There's no Phi for the slice of the sunk Store (because
there's no Store left in the inner loop) so no Phi is added for
that slice to the outer loop. As a result, there's a missing anti
dependency for Load of field that's before the loop and it can be
scheduled inside the outer strip mined loop which is incorrect.

test2() is the same as test1() but with a chain of 2 Stores.

test3() is another variant where a Store is left in the inner loop
after one is sunk out of it so the inner loop still has a Phi. As a
result, the outer loop also gets a Phi but it's incorrectly wired as
the sunk Store should be the input along the backedge but is
not. That one doesn't cause any failure AFAICT.

The fix I propose is some extra logic at expansion of the
OuterStripMinedLoop to handle these corner cases.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8356708: C2: loop strip mining expansion doesn't take sunk stores into account (Bug - P2)(⚠️ The fixVersion in this issue is [25] but the fixVersion in .jcheck/conf is 26, a new backport will be created when this pr is integrated.)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25717/head:pull/25717
$ git checkout pull/25717

Update a local copy of the PR:
$ git checkout pull/25717
$ git pull https://git.openjdk.org/jdk.git pull/25717/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25717

View PR using the GUI difftool:
$ git pr show -t 25717

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25717.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 10, 2025

👋 Welcome back roland! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 10, 2025

@rwestrel This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8356708: C2: loop strip mining expansion doesn't take sunk stores into account

Reviewed-by: rcastanedalo, epeter

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 94 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 10, 2025
@openjdk
Copy link

openjdk bot commented Jun 10, 2025

@rwestrel The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Jun 10, 2025

@robcasloz
Copy link
Contributor

Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two.

Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general.

@robcasloz
Copy link
Contributor

Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two.

Test results (tier1-5 in Oracle's internal test system) look good.

Copy link
Contributor

@robcasloz robcasloz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise!

Another question: could PhaseIdealLoop::try_move_store_before_loop cause similar issues on strip-mined loops?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good, for completeness, to add a "Couple stores sunk in outer loop, store in inner loop" test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new commit.

@eme64
Copy link
Contributor

eme64 commented Jun 12, 2025

Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general.

@robcasloz That would have also been my question. @rwestrel why did we omit those Phis at the outer strip-mined loop in the first place? Is that not asking for all sorts of trouble and special handling?

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwestrel Thanks for looking into this! The fix seems reasonable given that we don't have phi's at the outer loop. But why don't we have those phis in the first place?

// Sunk stores are reachable from the memory state of the outer loop safepoint
Node* safepoint = outer_safepoint();
Node* safepoint_mem = safepoint->in(TypeFunc::Memory);
if (safepoint_mem->is_MergeMem()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have flipped the condition, and made an early exit condition from this. That way, the code is indented one level less. Just a suggestion, feel free to ignore :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in new commit.

@rwestrel
Copy link
Contributor Author

Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two.

Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general.

I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the OuterStripMinedLoop.

It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit.

@eme64
Copy link
Contributor

eme64 commented Jun 12, 2025

@rwestrel Dropping the Phi means there are fewer nodes, and probably it is easier to let things float out from the inner loop. That probably works fine for data nodes, and loads that float up. But for sunk stores it doesn't work... at least not without your cleanup now.

In some way, not having the Phi violates the assumption of the C2 IR. Namely that if you can change the value/memory during the loop, then you need a Phi to model that. That is a bit scary, to handle outer strip mined loops different... sure we want the old optimizations to still work, but we have no idea what "new" things now badly optimize, like in this case here. Any new optimization also has to be aware that in the case of strip mined loops, the absence of a phi does not mean there cannot be a mutation on that data/memory. Not great :/

Adding the Phis in all cases would probably break some optimizations, as you say. Maybe we would have to add dedicated skip_strip_mining logic all over the place. It would be difficult to know where we are missing them. That's not great either :/

It's a difficult trade-off.

@rwestrel
Copy link
Contributor Author

rwestrel commented Jun 12, 2025

Thanks for the reviews @robcasloz @eme64
Change is ready for another pass.

@rwestrel
Copy link
Contributor Author

Another question: could PhaseIdealLoop::try_move_store_before_loop cause similar issues on strip-mined loops?

That one moves stores out of the inner and outer loop so, no, I don't see a similar issue there.

@robcasloz
Copy link
Contributor

Thanks for the reviews @robcasloz @eme64 Change is read for another pass.

Thanks Roland, I'll re-run testing and come back with results on Monday.

@robcasloz
Copy link
Contributor

Thanks for working on this, Roland. I just submitted some testing, will come back with the results in a day or two.
Generally, I agree with your proposed approach of handling the case at expansion time as a low-risk fix for JDK 25. But as future work, would it be feasible to maintain regular SSA form for outer strip-mined loops (adding memory and data phi nodes at both loop levels) rather than omitting phi nodes for the outer loops and "repairing" SSA on macro expansion, or is there any fundamental obstacle in doing the former? It would have prevented issues like this, and feels like a more principled and robust approach in general.

I don't think there's a fundamental obstacle. The reason I implemented loop strip mining that way is that I was concerned it would be complicated for existing transformations to be supported without major code change and that there was a chance subtle issues would creep in (such as some optimizations not happening any more). So I tried to have a minimal set of extra nodes for strip mined loops initially so existing transformations would simply need to skip over the OuterStripMinedLoop.

It's quite possible that having the full outer strip mined loop early on works fine and that there's no need for changes all over the loop optimizations. I suppose someone would need to give it a try. This said, I still think keeping the graph simple when crucial transformations happen has some merit.

Thanks for the background, Roland!

I think it would be worth exploring this, but I agree that there is a risk of silently affecting other loop optimizations. Luckily, the IR test framework gives us now a means to improve our confidence that changes in this area do not affect expected optimizations. Unfortunately, our current IR test coverage of loop optimizations is incomplete, so a pre-condition to exploring full SSA for strip-mined loops (and something worth doing in any case IMO) would be adding more IR tests checking that at least basic optimizations like peeling, unswitching, unrolling, range check elimination, etc. happen as expected.

Copy link
Contributor

@robcasloz robcasloz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing went faster than I thought and did not reveal any issue, looks good, thank you for fixing this!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 13, 2025
rwestrel and others added 2 commits June 16, 2025 17:20
…terStripMinedLoop.java

Co-authored-by: Roberto Castañeda Lozano <[email protected]>
…terStripMinedLoop.java

Co-authored-by: Roberto Castañeda Lozano <[email protected]>
@rwestrel
Copy link
Contributor Author

Thanks for the review @robcasloz

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jun 16, 2025
Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwestrel Thanks for the work here :)

As mentioned previously, it feels like we have allowed a violation of C2 IR assumptions, namely that there is always a Phi if we mutate the memory state. And now we need to clean up things after that violation. Not great, but I understand why we got here: for simplicity of the outer strip mined loop, and not affecting other optimizations.

Like I ask in one of my code comments below:
Is there any place where we describe why we do not have Phis at the outer loop, and why we think that should be ok? It would be good to have those assumptions documented. And then you can refer to the method OuterStripMinedLoopNode::handle_sunk_stores_at_expansion, where we have to clean things up, and from there also back to the main description.

I see you have some minimal comments at PhaseIdealLoop::create_outer_strip_mined_loop:

// to the loop head. The inner strip mined loop is left as it is. Only
// once loop optimizations are over, do we adjust the inner loop exit
// condition to limit its number of iterations, set the outer loop
// exit condition and add Phis to the outer loop head.

I think we should add some more info there, and link to and from OuterStripMinedLoopNode::handle_sunk_stores_at_expansion.

Additionally / alternatively, you could also comment directly at the OuterStripMinedLoopNode class.

I just would like to prevent the situation where you are the only person who is able to understand how the outer strip mined loop works ;)

@@ -2988,11 +2989,75 @@ void OuterStripMinedLoopNode::fix_sunk_stores(CountedLoopEndNode* inner_cle, Loo
}
}

// Sunk stores should be referenced from an outer loop memory Phi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you really need to give some longer explanation here why we need to do what you do here.

Also: is there anywhere a description why we do not have the phi already by default for outer loops? Because I think we should really describe that somewhere, and state our assumptions. And then you could also refer to that description from here, and from there to here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in new commit.

void OuterStripMinedLoopNode::handle_sunk_stores_at_expansion(PhaseIterGVN* igvn) {
Node* cle_exit_proj = inner_loop_exit();

// Sunk stores are pinned on the loop exit projection of the inner loop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a description why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New commit should address that one as well.

}
#endif

// Sunk stores are reachable from the memory state of the outer loop safepoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that the control of the safepoint is the cle_exit_proj? Could we add an assert for that? So we are just looking for all memory between those two control nodes? Or is it more complicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. I added a call to verify_strip_mined() that checks the shape of the outer loop including the control flow nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks :)

@@ -2988,11 +2989,75 @@ void OuterStripMinedLoopNode::fix_sunk_stores(CountedLoopEndNode* inner_cle, Loo
}
}

// Sunk stores should be referenced from an outer loop memory Phi
void OuterStripMinedLoopNode::handle_sunk_stores_at_expansion(PhaseIterGVN* igvn) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the word "expansion" refer to? Could you also mention that in your code comment above, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed that one.

@rwestrel
Copy link
Contributor Author

I see you have some minimal comments at PhaseIdealLoop::create_outer_strip_mined_loop:

// to the loop head. The inner strip mined loop is left as it is. Only
// once loop optimizations are over, do we adjust the inner loop exit
// condition to limit its number of iterations, set the outer loop
// exit condition and add Phis to the outer loop head.

I think we should add some more info there, and link to and from OuterStripMinedLoopNode::handle_sunk_stores_at_expansion.

Done in new commit. Can you have another look @eme64 ?

@robcasloz
Copy link
Contributor

Re-running Oracle-internal testing...

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates an improved documentation!

I have a few more minor suggestions :)

Comment on lines +332 to +333
// As loop optimizations transform the inner loop, the outer strip mined loop stays mostly unchanged. The only exception
// is nodes referenced from the SafePoint and sunk from the inner loop: they end up in the outer strip mined loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to reference handle_sunk_stores_when_finishing_construction?

@@ -3111,6 +3213,8 @@ void OuterStripMinedLoopNode::adjust_strip_mined_loop(PhaseIterGVN* igvn) {
}
}

handle_sunk_stores_when_finishing_construction(igvn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above, where you insert the phis, you may want to say something about the case of Sunk Stores as well.

}
#endif

// Sunk stores are reachable from the memory state of the outer loop safepoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks :)

@eme64
Copy link
Contributor

eme64 commented Jun 18, 2025

@TobiHartmann Had 2 questions:

  • Is this really something we want to put into JDK 25? Feels high-risk and it's and old issue after all. Maybe we can push this to JDK26 first, and backport a little later?
  • Is this really a regression from JDK-8281322? If not, the affects version in JBS should be updated such that we'll consider this for backporting.

@robcasloz
Copy link
Contributor

Re-running Oracle-internal testing...

Testing (commit fa550f2 applied on top of jdk-26+2) passed.

@rwestrel
Copy link
Contributor Author

I have a few more minor suggestions :)

New commit should cover your latest comments. Can you please have another look @eme64 ?

@rwestrel
Copy link
Contributor Author

* Is this really something we want to put into JDK 25? Feels high-risk and it's and old issue after all. Maybe we can push this to JDK26 first, and backport a little later?

Either way is fine with me. It does feel like a nasty issue that wouldn't be easy to diagnose if someone runs into it in the wild.

* Is this really a regression from [JDK-8281322](https://bugs.openjdk.org/browse/JDK-8281322)? If not, the affects version in JBS should be updated such that we'll consider this for backporting.

I don't think it is. It's an issue that exists since loop strip mining exists AFAICT. I haven't tried how far back the test reproduces it though.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 19, 2025
@robcasloz
Copy link
Contributor

* Is this really something we want to put into JDK 25? Feels high-risk and it's and old issue after all. Maybe we can push this to JDK26 first, and backport a little later?

Either way is fine with me. It does feel like a nasty issue that wouldn't be easy to diagnose if someone runs into it in the wild.

In my opinion, the fix is quite local and contained, so the risk of causing regressions does not seem too high. There is also still quite some time left to observe and react to issues before the RC phase. I would vote for JDK 25.

@eme64
Copy link
Contributor

eme64 commented Jun 20, 2025

* Is this really something we want to put into JDK 25? Feels high-risk and it's and old issue after all. Maybe we can push this to JDK26 first, and backport a little later?

Either way is fine with me. It does feel like a nasty issue that wouldn't be easy to diagnose if someone runs into it in the wild.

I'm ok with that too. The fix looks reasonable, and not too risky. It is probably better to have this fix in than keeping the bug in the wild, as you say.

* Is this really a regression from [JDK-8281322](https://bugs.openjdk.org/browse/JDK-8281322)? If not, the affects version in JBS should be updated such that we'll consider this for backporting.

I don't think it is. It's an issue that exists since loop strip mining exists AFAICT. I haven't tried how far back the test reproduces it though.

It would be good if you could find out what versions are really affected, and set the affected numbers accordingly.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwestrel The patch looks good, thanks for the work @rwestrel !

That said, and as mentioned above: we should probably investigate if we can add the Phi's from the beginning, so that we do not violate the C2 IR assumptions.

@rwestrel
Copy link
Contributor Author

That said, and as mentioned above: we should probably investigate if we can add the Phi's from the beginning, so that we do not violate the C2 IR assumptions.

I filed: https://bugs.openjdk.org/browse/JDK-8360096

@rwestrel
Copy link
Contributor Author

It would be good if you could find out what versions are really affected, and set the affected numbers accordingly.

I conservatively added every version since loop strip mining was integrated given they are affected even if this particular test case doesn't fail.

@rwestrel
Copy link
Contributor Author

Thanks for the reviews @robcasloz @eme64

@rwestrel
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Jun 20, 2025

Going to push as commit c11f36e.
Since your change was applied there have been 95 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 20, 2025
@openjdk openjdk bot closed this Jun 20, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 20, 2025
@openjdk
Copy link

openjdk bot commented Jun 20, 2025

@rwestrel Pushed as commit c11f36e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler [email protected] integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants