Skip to content

Commit 7025ac8

Browse files
committed
[X86] Don't elide argument copies for scalarized vectors (PR63475)
When eliding argument copies, the memory layout between a plain store of the type and the layout of the argument lowering on the stack must match. For multi-part argument lowerings, this is not necessarily the case. The code already tried to prevent this optimization for "scalarized and extended" vectors, but the check for "extends" was incomplete. While a scalarized vector of i32s stores i32 values on the stack, these are stored in 8 byte stack slots (on x86_64), so effectively have padding. Rather than trying to add more special cases to handle this (which is not straightforward), I'm going in the other direction and exclude scalarized vectors from this optimization entirely. This seems like a rare case that is not worth the hassle -- the complete lack of test coverage is not reassuring either. Fixes #63475. Differential Revision: https://reviews.llvm.org/D154078
1 parent f78a06e commit 7025ac8

File tree

2 files changed

+35
-24
lines changed

2 files changed

+35
-24
lines changed

llvm/lib/Target/X86/X86ISelLowering.cpp

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3851,21 +3851,18 @@ X86TargetLowering::LowerMemArgument(SDValue Chain, CallingConv::ID CallConv,
38513851

38523852
EVT ArgVT = Ins[i].ArgVT;
38533853

3854-
// If this is a vector that has been split into multiple parts, and the
3855-
// scalar size of the parts don't match the vector element size, then we can't
3856-
// elide the copy. The parts will have padding between them instead of being
3857-
// packed like a vector.
3858-
bool ScalarizedAndExtendedVector =
3859-
ArgVT.isVector() && !VA.getLocVT().isVector() &&
3860-
VA.getLocVT().getSizeInBits() != ArgVT.getScalarSizeInBits();
3854+
// If this is a vector that has been split into multiple parts, don't elide
3855+
// the copy. The layout on the stack may not match the packed in-memory
3856+
// layout.
3857+
bool ScalarizedVector = ArgVT.isVector() && !VA.getLocVT().isVector();
38613858

38623859
// This is an argument in memory. We might be able to perform copy elision.
38633860
// If the argument is passed directly in memory without any extension, then we
38643861
// can perform copy elision. Large vector types, for example, may be passed
38653862
// indirectly by pointer.
38663863
if (Flags.isCopyElisionCandidate() &&
38673864
VA.getLocInfo() != CCValAssign::Indirect && !ExtendedInMem &&
3868-
!ScalarizedAndExtendedVector) {
3865+
!ScalarizedVector) {
38693866
SDValue PartAddr;
38703867
if (Ins[i].PartOffset == 0) {
38713868
// If this is a one-part value or the first part of a multi-part value,

llvm/test/CodeGen/X86/pr63475.ll

Lines changed: 30 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@ define void @caller() nounwind {
2727
ret void
2828
}
2929

30-
; FIXME: This is a miscompile.
30+
; Make sure the stack offsets are correct. The distance between them should
31+
; be 8, not 4.
3132
define void @callee(ptr %p0, ptr %p1, ptr %p2, ptr %p3, ptr %p4, ptr %p5, <7 x i32> %arg) nounwind {
3233
; CHECK-LABEL: callee:
3334
; CHECK: # %bb.0: # %start
@@ -37,28 +38,41 @@ define void @callee(ptr %p0, ptr %p1, ptr %p2, ptr %p3, ptr %p4, ptr %p5, <7 x i
3738
; CHECK-NEXT: pushq %r13
3839
; CHECK-NEXT: pushq %r12
3940
; CHECK-NEXT: pushq %rbx
40-
; CHECK-NEXT: pushq %rax
41-
; CHECK-NEXT: movl 112(%rsp), %ebx
42-
; CHECK-NEXT: movl 104(%rsp), %ebp
43-
; CHECK-NEXT: movl 96(%rsp), %r14d
44-
; CHECK-NEXT: movl 76(%rsp), %r15d
45-
; CHECK-NEXT: movl 72(%rsp), %r12d
46-
; CHECK-NEXT: movl 64(%rsp), %edi
47-
; CHECK-NEXT: movl 68(%rsp), %r13d
48-
; CHECK-NEXT: callq use@PLT
49-
; CHECK-NEXT: movl %r13d, %edi
50-
; CHECK-NEXT: callq use@PLT
51-
; CHECK-NEXT: movl %r12d, %edi
41+
; CHECK-NEXT: subq $40, %rsp
42+
; CHECK-NEXT: movl 120(%rsp), %ebx
43+
; CHECK-NEXT: movd %ebx, %xmm0
44+
; CHECK-NEXT: movl 112(%rsp), %ebp
45+
; CHECK-NEXT: movd %ebp, %xmm1
46+
; CHECK-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
47+
; CHECK-NEXT: movl 104(%rsp), %r15d
48+
; CHECK-NEXT: movd %r15d, %xmm0
49+
; CHECK-NEXT: movl 96(%rsp), %edi
50+
; CHECK-NEXT: movd %edi, %xmm2
51+
; CHECK-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
52+
; CHECK-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm1[0]
53+
; CHECK-NEXT: movl 136(%rsp), %r14d
54+
; CHECK-NEXT: movd %r14d, %xmm0
55+
; CHECK-NEXT: movl 128(%rsp), %r12d
56+
; CHECK-NEXT: movd %r12d, %xmm1
57+
; CHECK-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
58+
; CHECK-NEXT: movl 144(%rsp), %r13d
59+
; CHECK-NEXT: movl %r13d, 36(%rsp)
60+
; CHECK-NEXT: movq %xmm1, 28(%rsp)
61+
; CHECK-NEXT: movdqu %xmm2, 12(%rsp)
5262
; CHECK-NEXT: callq use@PLT
5363
; CHECK-NEXT: movl %r15d, %edi
5464
; CHECK-NEXT: callq use@PLT
55-
; CHECK-NEXT: movl %r14d, %edi
56-
; CHECK-NEXT: callq use@PLT
5765
; CHECK-NEXT: movl %ebp, %edi
5866
; CHECK-NEXT: callq use@PLT
5967
; CHECK-NEXT: movl %ebx, %edi
6068
; CHECK-NEXT: callq use@PLT
61-
; CHECK-NEXT: addq $8, %rsp
69+
; CHECK-NEXT: movl %r12d, %edi
70+
; CHECK-NEXT: callq use@PLT
71+
; CHECK-NEXT: movl %r14d, %edi
72+
; CHECK-NEXT: callq use@PLT
73+
; CHECK-NEXT: movl %r13d, %edi
74+
; CHECK-NEXT: callq use@PLT
75+
; CHECK-NEXT: addq $40, %rsp
6276
; CHECK-NEXT: popq %rbx
6377
; CHECK-NEXT: popq %r12
6478
; CHECK-NEXT: popq %r13

0 commit comments

Comments
 (0)