[InstCombine] Fold `select(load, val) + store` into `llvm.masked.store` #144298

abhishek-kaushik22 · 2025-06-16T05:56:19Z

This patch adds a new InstCombine optimization that transforms a pattern of the form:

%load = load <8 x i32>, ptr %ptr, align 32
%sel = select <8 x i1> %cmp, <8 x i32> %x, <8 x i32> %load
store <8 x i32> %sel, ptr %ptr, align 32

into:

@llvm.masked.store.v8i32.p0(<8 x i32> %x, ptr %ptr, i32 32, <8 x i1> %cmp)

This patch adds a new InstCombine optimization that transforms a pattern of the form: ``` %load = load <8 x i32>, ptr %ptr, align 32 %sel = select <8 x i1> %cmp, <8 x i32> %x, <8 x i32> %load store <8 x i32> %sel, ptr %ptr, align 32 ``` into: ``` @llvm.masked.store.v8i32.p0(<8 x i32> %x, ptr %ptr, i32 32, <8 x i1> %cmp) ```

llvmbot · 2025-06-16T05:56:49Z

@llvm/pr-subscribers-llvm-transforms

Author: Abhishek Kaushik (abhishek-kaushik22)

Changes

This patch adds a new InstCombine optimization that transforms a pattern of the form:

%load = load &lt;8 x i32&gt;, ptr %ptr, align 32
%sel = select &lt;8 x i1&gt; %cmp, &lt;8 x i32&gt; %x, &lt;8 x i32&gt; %load
store &lt;8 x i32&gt; %sel, ptr %ptr, align 32

into:

@<!-- -->llvm.masked.store.v8i32.p0(&lt;8 x i32&gt; %x, ptr %ptr, i32 32, &lt;8 x i1&gt; %cmp)

Full diff: https://github.com/llvm/llvm-project/pull/144298.diff

3 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (+45)
(added) llvm/test/Transforms/InstCombine/masked-store.ll (+63)
(modified) llvm/test/Transforms/LoopVectorize/if-conversion.ll (+1-2)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
index 1d208de75db3b..ab0228d33db21 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -1363,6 +1363,48 @@ static bool equivalentAddressValues(Value *A, Value *B) {
   return false;
 }
 
+// Combine
+//   %load = load <8 x i32>, ptr %ptr, align 32
+//   %sel = select <8 x i1> %cmp, <8 x i32> %x, <8 x i32> %load
+//   store <8 x i32> %sel, ptr %ptr, align 32
+// to
+//   @llvm.masked.store.v8i32.p0(<8 x i32> %x, ptr %ptr, i32 32, <8 x i1> %cmp)
+static bool combineToMaskedStore(InstCombinerImpl &IC, StoreInst &Store) {
+  Value *StoredValue = Store.getValueOperand();
+  auto *Select = dyn_cast<SelectInst>(StoredValue);
+  if (!Select || !StoredValue->getType()->isVectorTy())
+    return false;
+
+  Value *Condition = Select->getCondition();
+  Value *TrueValue = Select->getTrueValue();
+  Value *FalseValue = Select->getFalseValue();
+
+  const auto *Load = dyn_cast<LoadInst>(FalseValue);
+  if (!Load || Load->getPointerOperand() != Store.getPointerOperand())
+    return false;
+
+  if (Load->isVolatile() || Store.isVolatile() || Load->isAtomic() ||
+      Store.isAtomic())
+    return false;
+
+  Value *Pointer = Store.getPointerOperand();
+
+  for (const auto *I = Load->getNextNode(); I && I != &Store;
+       I = I->getNextNode()) {
+    if (I->mayHaveSideEffects())
+      return false;
+
+    if (const auto *OtherStore = dyn_cast<StoreInst>(I)) {
+      if (OtherStore->getPointerOperand() == Pointer)
+        return false;
+    }
+  }
+
+  IC.Builder.CreateMaskedStore(TrueValue, Pointer, Store.getAlign(), Condition);
+
+  return true;
+}
+
 Instruction *InstCombinerImpl::visitStoreInst(StoreInst &SI) {
   Value *Val = SI.getOperand(0);
   Value *Ptr = SI.getOperand(1);
@@ -1375,6 +1417,9 @@ Instruction *InstCombinerImpl::visitStoreInst(StoreInst &SI) {
   if (unpackStoreToAggregate(*this, SI))
     return eraseInstFromFunction(SI);
 
+  if (combineToMaskedStore(*this, SI))
+    return eraseInstFromFunction(SI);
+
   // Replace GEP indices if possible.
   if (Instruction *NewGEPI = replaceGEPIdxWithZero(*this, Ptr, SI))
     return replaceOperand(SI, 1, NewGEPI);
diff --git a/llvm/test/Transforms/InstCombine/masked-store.ll b/llvm/test/Transforms/InstCombine/masked-store.ll
new file mode 100644
index 0000000000000..bbbf2587a35ef
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/masked-store.ll
@@ -0,0 +1,63 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=instcombine -S < %s | FileCheck %s
+
+define void @test_masked_store_success(<8 x i32> %x, ptr %ptr, <8 x i1> %cmp) {
+; CHECK-LABEL: define void @test_masked_store_success(
+; CHECK-SAME: <8 x i32> [[X:%.*]], ptr [[PTR:%.*]], <8 x i1> [[CMP:%.*]]) {
+; CHECK-NEXT:    call void @llvm.masked.store.v8i32.p0(<8 x i32> [[X]], ptr [[PTR]], i32 32, <8 x i1> [[CMP]])
+; CHECK-NEXT:    ret void
+;
+  %load = load <8 x i32>, ptr %ptr, align 32
+  %sel = select <8 x i1> %cmp, <8 x i32> %x, <8 x i32> %load
+  store <8 x i32> %sel, ptr %ptr, align 32
+  ret void
+}
+
+define void @test_masked_store_volatile_load(<8 x i32> %x, ptr %ptr, <8 x i1> %cmp) {
+; CHECK-LABEL: define void @test_masked_store_volatile_load(
+; CHECK-SAME: <8 x i32> [[X:%.*]], ptr [[PTR:%.*]], <8 x i1> [[CMP:%.*]]) {
+; CHECK-NEXT:    [[LOAD:%.*]] = load volatile <8 x i32>, ptr [[PTR]], align 32
+; CHECK-NEXT:    [[SEL:%.*]] = select <8 x i1> [[CMP]], <8 x i32> [[X]], <8 x i32> [[LOAD]]
+; CHECK-NEXT:    store <8 x i32> [[SEL]], ptr [[PTR]], align 32
+; CHECK-NEXT:    ret void
+;
+  %load = load volatile <8 x i32>, ptr %ptr, align 32
+  %sel = select <8 x i1> %cmp, <8 x i32> %x, <8 x i32> %load
+  store <8 x i32> %sel, ptr %ptr, align 32
+  ret void
+}
+
+define void @test_masked_store_volatile_store(<8 x i32> %x, ptr %ptr, <8 x i1> %cmp) {
+; CHECK-LABEL: define void @test_masked_store_volatile_store(
+; CHECK-SAME: <8 x i32> [[X:%.*]], ptr [[PTR:%.*]], <8 x i1> [[CMP:%.*]]) {
+; CHECK-NEXT:    [[LOAD:%.*]] = load <8 x i32>, ptr [[PTR]], align 32
+; CHECK-NEXT:    [[SEL:%.*]] = select <8 x i1> [[CMP]], <8 x i32> [[X]], <8 x i32> [[LOAD]]
+; CHECK-NEXT:    store volatile <8 x i32> [[SEL]], ptr [[PTR]], align 32
+; CHECK-NEXT:    ret void
+;
+  %load = load <8 x i32>, ptr %ptr, align 32
+  %sel = select <8 x i1> %cmp, <8 x i32> %x, <8 x i32> %load
+  store volatile <8 x i32> %sel, ptr %ptr, align 32
+  ret void
+}
+
+declare void @use_vec(<8 x i32>)
+
+define void @test_masked_store_intervening(<8 x i32> %x, ptr %ptr, <8 x i1> %cmp) {
+; CHECK-LABEL: define void @test_masked_store_intervening(
+; CHECK-SAME: <8 x i32> [[X:%.*]], ptr [[PTR:%.*]], <8 x i1> [[CMP:%.*]]) {
+; CHECK-NEXT:    [[LOAD:%.*]] = load <8 x i32>, ptr [[PTR]], align 32
+; CHECK-NEXT:    store <8 x i32> zeroinitializer, ptr [[PTR]], align 32
+; CHECK-NEXT:    call void @use_vec(<8 x i32> zeroinitializer)
+; CHECK-NEXT:    [[SEL:%.*]] = select <8 x i1> [[CMP]], <8 x i32> [[X]], <8 x i32> [[LOAD]]
+; CHECK-NEXT:    store <8 x i32> [[SEL]], ptr [[PTR]], align 32
+; CHECK-NEXT:    ret void
+;
+  %load = load <8 x i32>, ptr %ptr, align 32
+  store <8 x i32> zeroinitializer, ptr %ptr, align 32
+  %tmp = load <8 x i32>, ptr %ptr
+  call void @use_vec(<8 x i32> %tmp)
+  %sel = select <8 x i1> %cmp, <8 x i32> %x, <8 x i32> %load
+  store <8 x i32> %sel, ptr %ptr, align 32
+  ret void
+}
diff --git a/llvm/test/Transforms/LoopVectorize/if-conversion.ll b/llvm/test/Transforms/LoopVectorize/if-conversion.ll
index 8a7f4a386fda1..622726f6d1fe7 100644
--- a/llvm/test/Transforms/LoopVectorize/if-conversion.ll
+++ b/llvm/test/Transforms/LoopVectorize/if-conversion.ll
@@ -61,8 +61,7 @@ define i32 @function0(ptr nocapture %a, ptr nocapture %b, i32 %start, i32 %end)
 ; CHECK-NEXT:    [[DOTNOT:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], [[WIDE_LOAD4]]
 ; CHECK-NEXT:    [[TMP15:%.*]] = mul <4 x i32> [[WIDE_LOAD]], splat (i32 5)
 ; CHECK-NEXT:    [[TMP16:%.*]] = add <4 x i32> [[TMP15]], splat (i32 3)
-; CHECK-NEXT:    [[PREDPHI:%.*]] = select <4 x i1> [[DOTNOT]], <4 x i32> [[TMP16]], <4 x i32> [[WIDE_LOAD]]
-; CHECK-NEXT:    store <4 x i32> [[PREDPHI]], ptr [[TMP13]], align 4, !alias.scope [[META0]], !noalias [[META3]]
+; CHECK-NEXT:    call void @llvm.masked.store.v4i32.p0(<4 x i32> [[TMP16]], ptr [[TMP13]], i32 4, <4 x i1> [[DOTNOT]])
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[TMP17]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]

nikic

This is (extremely!) unprofitable for targets that don't have native masked stores. It will expect to a long chain of conditional stores.

dtcxzyw · 2025-06-16T08:44:24Z

This is (extremely!) unprofitable for targets that don't have native masked stores. It will expect to a long chain of conditional stores.

Might be able to perform this fold in AggressiveInstCombine where TTI.hasConditionalLoadStoreForType is available.

nikic · 2025-06-16T09:13:07Z

I don't think this fold should be performed in the middle-end at all.

abhishek-kaushik22 requested a review from nikic as a code owner June 16, 2025 05:56

llvmbot added llvm:instcombine llvm:transforms labels Jun 16, 2025

abhishek-kaushik22 requested a review from e-kud June 16, 2025 06:02

nikic requested changes Jun 16, 2025

View reviewed changes

abhishek-kaushik22 closed this Jun 16, 2025

abhishek-kaushik22 deleted the masked-store branch June 16, 2025 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[InstCombine] Fold `select(load, val) + store` into `llvm.masked.store` #144298

[InstCombine] Fold `select(load, val) + store` into `llvm.masked.store` #144298

Uh oh!

abhishek-kaushik22 commented Jun 16, 2025

Uh oh!

llvmbot commented Jun 16, 2025

Uh oh!

nikic left a comment

Uh oh!

dtcxzyw commented Jun 16, 2025

Uh oh!

nikic commented Jun 16, 2025

Uh oh!

Uh oh!

[InstCombine] Fold select(load, val) + store into llvm.masked.store #144298

[InstCombine] Fold select(load, val) + store into llvm.masked.store #144298

Uh oh!

Conversation

abhishek-kaushik22 commented Jun 16, 2025

Uh oh!

llvmbot commented Jun 16, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

dtcxzyw commented Jun 16, 2025

Uh oh!

nikic commented Jun 16, 2025

Uh oh!

Uh oh!

[InstCombine] Fold `select(load, val) + store` into `llvm.masked.store` #144298

[InstCombine] Fold `select(load, val) + store` into `llvm.masked.store` #144298