Skip to content

[InstCombine] Optimize sub(sext(add(x,y)),sext(add(x,z))). #144174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

vzakhari
Copy link
Contributor

This pattern can be often met in Flang generated LLVM IR,
for example, for the counts of the loops generated for array
expressions like: a(x:x+y) or a(x+z:x+z) or their variations.

In order to compute the loop count, Flang needs to subtract
the lower bound of the array slice from the upper bound
of the array slice. To avoid the sign wraps, it sign extends
the original values (that may be of any user data type)
to i64.

This peephole is really helpful in CPU2017/548.exchange2,
where we have multiple following statements like this:

block(row+1:row+2, 7:9, i7) = block(row+1:row+2, 7:9, i7) - 10

While this is just a 2x3 iterations loop nest, LLVM cannot
figure it out, ending up vectorizing the inner loop really
hard (with a vector epilog and scalar remainder). This, in turn,
causes problems for LSR that ends up creating too many loop-carried
values in the loop containing the above statement, which are then
causing too many spills/reloads.

Alive2: https://alive2.llvm.org/ce/z/gLgfYX

Related to #143219.

This pattern can be often met in Flang generated LLVM IR,
for example, for the counts of the loops generated for array
expressions like: `a(x:x+y)` or `a(x+z:x+z)` or their variations.

In order to compute the loop count, Flang needs to subtract
the lower bound of the array slice from the upper bound
of the array slice. To avoid the sign wraps, it sign extends
the original values (that may be of any user data type)
to `i64`.

This peephole is really helpful in CPU2017/548.exchange2,
where we have multiple following statements like this:
```
block(row+1:row+2, 7:9, i7) = block(row+1:row+2, 7:9, i7) - 10
```

While this is just a 2x3 iterations loop nest, LLVM cannot
figure it out, ending up vectorizing the inner loop really
hard (with a vector epilog and scalar remainder). This, in turn,
causes problems for LSR that ends up creating too many loop-carried
values in the loop containing the above statement, which are then
causing too many spills/reloads.

Alive2: https://alive2.llvm.org/ce/z/gLgfYX

Related to llvm#143219.
@llvmbot
Copy link
Member

llvmbot commented Jun 14, 2025

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: Slava Zakharin (vzakhari)

Changes

This pattern can be often met in Flang generated LLVM IR,
for example, for the counts of the loops generated for array
expressions like: a(x:x+y) or a(x+z:x+z) or their variations.

In order to compute the loop count, Flang needs to subtract
the lower bound of the array slice from the upper bound
of the array slice. To avoid the sign wraps, it sign extends
the original values (that may be of any user data type)
to i64.

This peephole is really helpful in CPU2017/548.exchange2,
where we have multiple following statements like this:

block(row+1:row+2, 7:9, i7) = block(row+1:row+2, 7:9, i7) - 10

While this is just a 2x3 iterations loop nest, LLVM cannot
figure it out, ending up vectorizing the inner loop really
hard (with a vector epilog and scalar remainder). This, in turn,
causes problems for LSR that ends up creating too many loop-carried
values in the loop containing the above statement, which are then
causing too many spills/reloads.

Alive2: https://alive2.llvm.org/ce/z/gLgfYX

Related to #143219.


Full diff: https://github.com/llvm/llvm-project/pull/144174.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp (+56)
  • (added) llvm/test/Transforms/InstCombine/sub-sext-add.ll (+89)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index c1ce364eb1794..35de76d50672d 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -2807,6 +2807,62 @@ Instruction *InstCombinerImpl::visitSub(BinaryOperator &I) {
   if (Instruction *Res = foldBinOpOfSelectAndCastOfSelectCondition(I))
     return Res;
 
+  // (sub[ nsw][ nuw] (sext (add nsw (X, Y)), sext (X))) --> (sext (Y))
+  {
+    Value *Add0;
+    if (match(Op0, m_SExt(m_Value(Add0))) &&
+        match(Add0, m_Add(m_Value(X), m_Value(Y))) &&
+        match(Op1, m_SExt(m_Specific(X)))) {
+      auto *OBO0 = cast<OverflowingBinaryOperator>(Add0);
+      if (OBO0->hasNoSignedWrap()) {
+        // Non-constant Y requires new SExt.
+        unsigned numOfNewInstrs = !isa<Constant>(Y) ? 1 : 0;
+        // Check if we can trade some of the old instructions for the new ones.
+        unsigned numOfDeadInstrs = 0;
+        numOfDeadInstrs += Op0->hasOneUse() ? 1 : 0;
+        numOfDeadInstrs += Op1->hasOneUse() ? 1 : 0;
+        numOfDeadInstrs += Add0->hasOneUse() ? 1 : 0;
+        if (numOfDeadInstrs >= numOfNewInstrs) {
+          Value *SExtY = Builder.CreateSExt(Y, I.getType());
+          return replaceInstUsesWith(I, SExtY);
+        }
+      }
+    }
+  }
+
+  // (sub[ nsw] (sext (add nsw (X, Y)), sext (add nsw (X, Z)))) -->
+  // --> (sub[ nsw] (sext (Y), sext(Z)))
+  {
+    Value *Z, *Add0, *Add1;
+    if (match(Op0, m_SExt(m_Value(Add0))) &&
+        match(Add0, m_Add(m_Value(X), m_Value(Y))) &&
+        match(Op1, m_SExt(m_Value(Add1))) &&
+        match(Add1, m_Add(m_Specific(X), m_Value(Z)))) {
+      auto *OBO0 = cast<OverflowingBinaryOperator>(Add0);
+      auto *OBO1 = cast<OverflowingBinaryOperator>(Add1);
+      if (OBO0->hasNoSignedWrap() && OBO1->hasNoSignedWrap()) {
+        unsigned numOfNewInstrs = 0;
+        // Non-constant Y, Z require new SExt.
+        numOfNewInstrs += !isa<Constant>(Y) ? 1 : 0;
+        numOfNewInstrs += !isa<Constant>(Z) ? 1 : 0;
+        // Check if we can trade some of the old instructions for the new ones.
+        unsigned numOfDeadInstrs = 0;
+        numOfDeadInstrs += Op0->hasOneUse() ? 1 : 0;
+        numOfDeadInstrs += Op1->hasOneUse() ? 1 : 0;
+        numOfDeadInstrs += Add0->hasOneUse() ? 1 : 0;
+        numOfDeadInstrs += Add1->hasOneUse() ? 1 : 0;
+        if (numOfDeadInstrs >= numOfNewInstrs) {
+          Value *SExtY = Builder.CreateSExt(Y, I.getType());
+          Value *SExtZ = Builder.CreateSExt(Z, I.getType());
+          Value *Sub = Builder.CreateSub(SExtY, SExtZ, "",
+                                         /* HasNUW */ false,
+                                         /* HasNSW */ I.hasNoSignedWrap());
+          return replaceInstUsesWith(I, Sub);
+        }
+      }
+    }
+  }
+
   return TryToNarrowDeduceFlags();
 }
 
diff --git a/llvm/test/Transforms/InstCombine/sub-sext-add.ll b/llvm/test/Transforms/InstCombine/sub-sext-add.ll
new file mode 100644
index 0000000000000..8b12acdf95ba5
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/sub-sext-add.ll
@@ -0,0 +1,89 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+define i64 @src_2add_2sext_sub(i32 %x, i32 %y, i32 %z) {
+; CHECK-LABEL: define i64 @src_2add_2sext_sub(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]], i32 [[Z:%.*]]) {
+; CHECK-NEXT:    [[SEXT1:%.*]] = sext i32 [[Y]] to i64
+; CHECK-NEXT:    [[SEXT2:%.*]] = sext i32 [[Z]] to i64
+; CHECK-NEXT:    [[SUB:%.*]] = sub nsw i64 [[SEXT1]], [[SEXT2]]
+; CHECK-NEXT:    ret i64 [[SUB]]
+;
+  %add1 = add nsw i32 %x, %y
+  %add2 = add nsw i32 %x, %z
+  %sext1 = sext i32 %add1 to i64
+  %sext2 = sext i32 %add2 to i64
+  %sub = sub i64 %sext1, %sext2
+  ret i64 %sub
+}
+
+define i64 @src_2add_2sext_sub_nsw(i32 %x, i32 %y, i32 %z) {
+; CHECK-LABEL: define i64 @src_2add_2sext_sub_nsw(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]], i32 [[Z:%.*]]) {
+; CHECK-NEXT:    [[SEXT1:%.*]] = sext i32 [[Y]] to i64
+; CHECK-NEXT:    [[SEXT2:%.*]] = sext i32 [[Z]] to i64
+; CHECK-NEXT:    [[SUB:%.*]] = sub nsw i64 [[SEXT1]], [[SEXT2]]
+; CHECK-NEXT:    ret i64 [[SUB]]
+;
+  %add1 = add nsw i32 %x, %y
+  %add2 = add nsw i32 %x, %z
+  %sext1 = sext i32 %add1 to i64
+  %sext2 = sext i32 %add2 to i64
+  %sub = sub nsw i64 %sext1, %sext2
+  ret i64 %sub
+}
+
+define i64 @src_2add_2sext_sub_nuw(i32 %x, i32 %y, i32 %z) {
+; CHECK-LABEL: define i64 @src_2add_2sext_sub_nuw(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]], i32 [[Z:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i32 [[Y]] to i64
+; CHECK-NEXT:    [[TMP2:%.*]] = sext i32 [[Z]] to i64
+; CHECK-NEXT:    [[SUB:%.*]] = sub nsw i64 [[TMP1]], [[TMP2]]
+; CHECK-NEXT:    ret i64 [[SUB]]
+;
+  %add1 = add nsw i32 %x, %y
+  %add2 = add nsw i32 %x, %z
+  %sext1 = sext i32 %add1 to i64
+  %sext2 = sext i32 %add2 to i64
+  %sub = sub nuw i64 %sext1, %sext2
+  ret i64 %sub
+}
+
+define i64 @src_x_add_2sext_sub(i32 %x, i32 %y) {
+; CHECK-LABEL: define i64 @src_x_add_2sext_sub(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT:    [[SUB:%.*]] = sext i32 [[Y]] to i64
+; CHECK-NEXT:    ret i64 [[SUB]]
+;
+  %add1 = add nsw i32 %x, %y
+  %sext1 = sext i32 %add1 to i64
+  %sext2 = sext i32 %x to i64
+  %sub = sub i64 %sext1, %sext2
+  ret i64 %sub
+}
+
+define i64 @src_x_add_2sext_sub_nsw(i32 %x, i32 %y) {
+; CHECK-LABEL: define i64 @src_x_add_2sext_sub_nsw(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT:    [[SUB:%.*]] = sext i32 [[Y]] to i64
+; CHECK-NEXT:    ret i64 [[SUB]]
+;
+  %add1 = add nsw i32 %x, %y
+  %sext1 = sext i32 %add1 to i64
+  %sext2 = sext i32 %x to i64
+  %sub = sub nsw i64 %sext1, %sext2
+  ret i64 %sub
+}
+
+define i64 @src_x_add_2sext_sub_nuw(i32 %x, i32 %y) {
+; CHECK-LABEL: define i64 @src_x_add_2sext_sub_nuw(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT:    [[SUB:%.*]] = sext i32 [[Y]] to i64
+; CHECK-NEXT:    ret i64 [[SUB]]
+;
+  %add1 = add nsw i32 %x, %y
+  %sext1 = sext i32 %add1 to i64
+  %sext2 = sext i32 %x to i64
+  %sub = sub nuw i64 %sext1, %sext2
+  ret i64 %sub
+}

@vzakhari vzakhari requested a review from topperc June 14, 2025 03:04
}

// (sub[ nsw] (sext (add nsw (X, Y)), sext (add nsw (X, Z)))) -->
// --> (sub[ nsw] (sext (Y), sext(Z)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// --> (sub[ nsw] (sext (Y), sext(Z)))
// --> (sub[ nsw] (sext (Y), sext (Z)))

Comment on lines 2841 to 2842
/* HasNUW */ false,
/* HasNSW */ I.hasNoSignedWrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* HasNUW */ false,
/* HasNSW */ I.hasNoSignedWrap());
/*HasNUW=*/false,
/*HasNSW=*/I.hasNoSignedWrap());

@vzakhari vzakhari requested a review from dtcxzyw June 16, 2025 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants