Skip to content

[VPlan] Add exit phi operands during initial construction (NFC). #136455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 23, 2025
18 changes: 6 additions & 12 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9373,11 +9373,8 @@ static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan,
}
}

// Collect VPIRInstructions for phis in the exit blocks that are modeled
// in VPlan and add the exiting VPValue as operand.
static SetVector<VPIRInstruction *>
collectUsersInExitBlocks(Loop *OrigLoop, VPRecipeBuilder &Builder,
VPlan &Plan) {
// Collect VPIRInstructions for phis in the exit block from the latch only.
static SetVector<VPIRInstruction *> collectUsersInLatchExitBlock(VPlan &Plan) {
SetVector<VPIRInstruction *> ExitUsersToFix;
for (VPIRBasicBlock *ExitVPBB : Plan.getExitBlocks()) {
// Nothing to do for unreachable exit blocks.
Expand All @@ -9393,11 +9390,8 @@ collectUsersInExitBlocks(Loop *OrigLoop, VPRecipeBuilder &Builder,
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent: can R iterate over ExitVPBB->phis() above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 8c83355, thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent: is this early continue needed:

    // Nothing to do for unreachable exit blocks.
    if (ExitVPBB->getNumPredecessors() == 0)
      continue;

given that unreachable exit blocks have been emptied of their all their recipes - including phi ones?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent: loop-unswitch this condition

      if (ExitVPBB->getSinglePredecessor() != Plan.getMiddleBlock()) {
        assert(ExitIRI->getNumOperands() ==
                   ExitVPBB->getPredecessors().size() &&
               "early-exit must update exit values on construction");
        continue;

rather than check it for all phi's?
assert can be removed given that the number of operands of all phi recipes should be equal to the number of predecessors - check in verifier instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks

}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above "and add the exiting VPValue as operand" is now obsolete. Worth noting that users of multiple (early) exits are excluded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, aslo dropped .. modeled in VPlan, as all are now modeled in VPlan.

PHINode &ExitPhi = ExitIRI->getIRPhi();
BasicBlock *ExitingBB = OrigLoop->getLoopLatch();
Value *IncomingValue = ExitPhi.getIncomingValueForBlock(ExitingBB);
VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue);
ExitIRI->addOperand(V);
assert(ExitIRI->getNumOperands() == 1 && "must have a single operand");
VPValue *V = ExitIRI->getOperand(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth asserting that ExitIRI has a single operand? Expected to match its single middle-block predecessor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

if (V->isLiveIn())
continue;
assert(V->getDefiningRecipe()->getParent()->getEnclosingLoopRegion() &&
Expand Down Expand Up @@ -9426,7 +9420,7 @@ addUsersInExitBlocks(VPlan &Plan,
ExitIRI->getParent()->getSinglePredecessor() == MiddleVPBB &&
"exit values from early exits must be fixed when branch to "
"early-exit is added");
ExitIRI->extractLastLaneOfOperand(B);
ExitIRI->extractLastLaneOfFirstOperand(B);
Comment on lines 9420 to +9423
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent: addUsersInExitBlocks() --> addExtractsToExitUsers()? This method introduces extracts for exit users of a single (latch) exit block, which are expected to be single operand VPIRPhi's, rather than adds users in multiple/all exit blocks. Drop "Add exit values to \p Plan."?

}
}

Expand Down Expand Up @@ -9767,7 +9761,7 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
DenseMap<VPValue *, VPValue *> IVEndValues;
addScalarResumePhis(RecipeBuilder, *Plan, IVEndValues);
SetVector<VPIRInstruction *> ExitUsersToFix =
collectUsersInExitBlocks(OrigLoop, RecipeBuilder, *Plan);
collectUsersInLatchExitBlock(*Plan);
addExitUsersForFirstOrderRecurrences(*Plan, ExitUsersToFix);
addUsersInExitBlocks(*Plan, ExitUsersToFix);

Expand Down
8 changes: 4 additions & 4 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -1161,10 +1161,10 @@ class VPIRInstruction : public VPRecipeBase {
return true;
}

/// Update the recipes single operand to the last lane of the operand using \p
/// Builder. Must only be used for single operand VPIRInstructions wrapping a
/// PHINode.
void extractLastLaneOfOperand(VPBuilder &Builder);
/// Update the recipes first operand to the last lane of the operand using \p
/// Builder. Must only be used for VPIRInstructions with at least one operand
/// wrapping a PHINode.
void extractLastLaneOfFirstOperand(VPBuilder &Builder);
};

/// An overlay for VPIRInstructions wrapping PHI nodes enabling convenient use
Expand Down
40 changes: 33 additions & 7 deletions llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,23 @@ std::unique_ptr<VPlan> PlainCFGBuilder::buildPlainCFG(
Plan->getEntry()->setOneSuccessor(getOrCreateVPBB(TheLoop->getHeader()));
Plan->getEntry()->setPlan(&*Plan);

// Fix VPlan loop-closed-ssa exit phi's by adding incoming operands to the
// VPIRInstructions wrapping them.
// // Note that the operand order corresponds to IR predecessor order, and may
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"// //" >> "//"

// need adjusting when VPlan predecessors are added, if an exit block has
// multiple predecessor.
for (auto *EB : Plan->getExitBlocks()) {
for (VPRecipeBase &R : EB->phis()) {
auto *PhiR = cast<VPIRPhi>(&R);
PHINode &Phi = PhiR->getIRPhi();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert PhiR is still w/o any operand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

assert(PhiR->getNumOperands() == 0 &&
"no phi operands should be added yet");
for (BasicBlock *Pred : predecessors(EB->getIRBasicBlock()))
PhiR->addOperand(
getOrCreateVPOperand(Phi.getIncomingValueForBlock(Pred)));
Comment on lines +366 to +368
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order of operands set here corresponds to the order of predecessors of underlying IRBB, before EB VPBB has predecessors. This inconsistency requires attention later, when these predecessors are added, possibly in a different order. May be worth leaving a note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

}
}

for (const auto &[IRBB, VPB] : BB2VPBB)
VPB2IRBB[VPB] = IRBB;

Expand Down Expand Up @@ -462,19 +479,28 @@ void VPlanTransforms::createLoopRegions(VPlan &Plan, Type *InductionTy,

VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
if (!RequiresScalarEpilogueCheck) {
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
return;
}

// If needed, add a check in the middle block to see if we have completed
// all of the iterations in the first vector loop. Three cases:
// 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
// 1) If we require a scalar epilogue, there is no conditional branch as
// we unconditionally branch to the scalar preheader. Remove the recipes
// from the exit blocks.
// 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
// Thus if tail is to be folded, we know we don't need to run the
// remainder and we can set the condition to true.
// 2) If we require a scalar epilogue, there is no conditional branch as
// we unconditionally branch to the scalar preheader. Do nothing.
// 3) Otherwise, construct a runtime check.

if (!RequiresScalarEpilogueCheck) {
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
// The exit blocks are unreachable, remove their recipes to make sure no
// users remain that may pessimize transforms.
for (auto *EB : Plan.getExitBlocks()) {
for (VPRecipeBase &R : make_early_inc_range(*EB))
R.eraseFromParent();
}
Comment on lines +495 to +500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserves mentioning in createLoopRegions()'s documentation? Which indeed does much more than create loop regions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be best to separate region creation from other 'skeleton additions' separately instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems so - better build the complete skeleton CFG first, then convert it to HCFG by converting its loops into regions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do separately, thanks!

return;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case

  // 2) If we require a scalar epilogue, there is no conditional branch as
  //    we unconditionally branch to the scalar preheader.  Do nothing.

is handle by early return above. Better place the explanation earlier, and replace "Do nothing" with "Empty the unreachable exit blocks of their recipes". The connection from scalar loop to exit blocks is (currently) outside of VPlan's scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks. Also changed 2->1 as this is the first handled case.

BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
auto *VPExitBlock = Plan.getExitBlock(IRExitBlock);
// The connection order corresponds to the operands of the conditional branch.
Expand Down
6 changes: 3 additions & 3 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1137,10 +1137,10 @@ InstructionCost VPIRInstruction::computeCost(ElementCount VF,
return 0;
}

void VPIRInstruction::extractLastLaneOfOperand(VPBuilder &Builder) {
void VPIRInstruction::extractLastLaneOfFirstOperand(VPBuilder &Builder) {
assert(isa<PHINode>(getInstruction()) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent: should this be implemented for VPIRPhi instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update separately, thanks

"can only add exiting operands to phi nodes");
assert(getNumOperands() == 1 && "must have a single operand");
"can only update exiting operands to phi nodes");
assert(getNumOperands() > 0 && "must have at least one operand");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yet we continue to extract the first operand, only? extractLastLaneOf[First]Operand()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated the name, thanks

VPValue *Exiting = getOperand(0);
if (!Exiting->isLiveIn()) {
Comment on lines 1144 to 1145
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPValue *Exiting = getOperand(0);
if (!Exiting->isLiveIn()) {
if (Exiting->isLiveIn())
return;
VPValue *Exiting = getOperand(0);

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will adjust separately, thanks

LLVMContext &Ctx = getInstruction().getContext();
Expand Down
34 changes: 22 additions & 12 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2505,35 +2505,45 @@ void VPlanTransforms::handleUncountableEarlyExit(
VPBuilder EarlyExitB(VectorEarlyExitVPBB);
for (VPRecipeBase &R : VPEarlyExitBlock->phis()) {
auto *ExitIRI = cast<VPIRPhi>(&R);
PHINode &ExitPhi = ExitIRI->getIRPhi();
VPValue *IncomingFromEarlyExit = RecipeBuilder.getVPValueOrAddLiveIn(
ExitPhi.getIncomingValueForBlock(UncountableExitingBlock));

// Early exit operand should always be last, i.e., 0 if VPEarlyExitBlock has
// a single predecessor and 1 if it has two.
unsigned EarlyExitIdx = ExitIRI->getNumOperands() - 1;
if (OrigLoop->getUniqueExitBlock()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent: better ask instead if VPEarlyExitBlock has two predecessors? One (the last) was added above - VectorEarlyExitVPBB. This should match the case where OrigLoop has a unique exit block - which would be aka VPEarlyExitBlock, or rather its underlying EarlyExitIRBB.

Suggested change
if (OrigLoop->getUniqueExitBlock()) {
if (VPEarlyExitBlock->getNumPredecessors() == 2) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check separately, thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. It's more consistent with following explanation.

// If there's a unique exit block, VPEarlyExitBlock has 2 predecessors
// (MiddleVPBB and NewMiddle). Add the incoming value from MiddleVPBB
// which is coming from the original latch.
VPValue *IncomingFromLatch = RecipeBuilder.getVPValueOrAddLiveIn(
ExitPhi.getIncomingValueForBlock(OrigLoop->getLoopLatch()));
ExitIRI->addOperand(IncomingFromLatch);
ExitIRI->extractLastLaneOfOperand(MiddleBuilder);
// If VPEarlyExitBlock has two predecessors, they are already ordered such
// that early exit is second (and latch exit is first), by construction.
// But its underlying IRBB (EarlyExitIRBB) may have its predecessors
// ordered the other way around, and it is the order of the latter which
// corresponds to the order of operands of VPEarlyExitBlock's phi recipes.
// Therefore, if early exit (UncountableExitingBlock) is the first
// predecessor of EarlyExitIRBB, we swap the operands of phi recipes,
// thereby bringing them to match VPEarlyExitBlock's predecessor order,
// with early exit being last (second). Otherwise they already match.
if (*pred_begin(VPEarlyExitBlock->getIRBasicBlock()) ==
UncountableExitingBlock)
ExitIRI->swapOperands();

// The first of two operands corresponds to the latch exit, via MiddleVPBB
// predecessor. Extract its last lane.
ExitIRI->extractLastLaneOfFirstOperand(MiddleBuilder);
}

VPValue *IncomingFromEarlyExit = ExitIRI->getOperand(EarlyExitIdx);
auto IsVector = [](ElementCount VF) { return VF.isVector(); };
// When the VFs are vectors, need to add `extract` to get the incoming value
// from early exit. When the range contains scalar VF, limit the range to
// scalar VF to prevent mis-compilation for the range containing both scalar
// and vector VFs.
if (!IncomingFromEarlyExit->isLiveIn() &&
LoopVectorizationPlanner::getDecisionAndClampRange(IsVector, Range)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent: clamping range inside VPlanTransform? Limiting the range to scalar VF - is another VPlan constructed for the vector (sub)range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is done early on when we are clamping the loop range for other reasons as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) Still puzzled about clamping the range at this stage, when all VPlans were already created following range clampings. Rather than say asserting that the range contains either scalar or vector VF's but not both, and introduce extracts if it's the latter. Extracts added above for latch exit need not check vector VF's?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the transform isn't run when all VPlans have been created, but in tryToBuildVPlanWithVPRecipes where we the range is also clamped in other places ( https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L9758)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, very well, thanks.

(Independent) May be good to indicate somehow transforms that may clamp the range - that belong to tryToBuildVPlanWithVPRecipes stage, and prevent them from operating afterwards, perhaps by disabling range clamping afterwards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range should only be available in tryToBuildVPlanWithVPRecipes and not subsequent optimizations; we could seaprate things even clearer by first building all VPlans (i.e. only do tryToBuildVPlanWithVPRecipes) and then optimize all VPlans separatel

// Update the incoming value from the early exit.
VPValue *FirstActiveLane = EarlyExitB.createNaryOp(
VPInstruction::FirstActiveLane, {EarlyExitTakenCond}, nullptr,
"first.active.lane");
IncomingFromEarlyExit = EarlyExitB.createNaryOp(
Instruction::ExtractElement, {IncomingFromEarlyExit, FirstActiveLane},
nullptr, "early.exit.value");
ExitIRI->setOperand(EarlyExitIdx, IncomingFromEarlyExit);
}
ExitIRI->addOperand(IncomingFromEarlyExit);
}
MiddleBuilder.createNaryOp(VPInstruction::BranchOnCond, {IsEarlyExitTaken});

Expand Down
Loading