Skip to content

[AMDGPU] Illegal VGPR to SGPR copy #144008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

isakhilesh
Copy link
Contributor

This patch resolves an instance of an illegal VGPR to SGPR copy by invoking BuildMI() when the source register is a VGPR and a destination register is a SGPR, and since we cannot copy data directly from a VGPR to a SGPR, we use AMDGPU::V_READFIRSTLANE_B32.

Fixes SWDEV-530052.

This patch resolves an instance of an illegal VGPR to SGPR copy by
invoking `BuildMI()` when the source register is a VGPR and a
destination register is a SGPR, and since we cannot copy data directly
from a VGPR to a SGPR, we use `AMDGPU::V_READFIRSTLANE_B32`.

Fixes SWDEV-530052.
@llvmbot
Copy link
Member

llvmbot commented Jun 13, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Akhilesh Moorthy (isakhilesh)

Changes

This patch resolves an instance of an illegal VGPR to SGPR copy by invoking BuildMI() when the source register is a VGPR and a destination register is a SGPR, and since we cannot copy data directly from a VGPR to a SGPR, we use AMDGPU::V_READFIRSTLANE_B32.

Fixes SWDEV-530052.


Full diff: https://github.com/llvm/llvm-project/pull/144008.diff

7 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+8-1)
  • (renamed) llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill.ll (+11-3)
  • (modified) llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll (+3-7)
  • (modified) llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll (+98-5)
  • (added) llvm/test/CodeGen/AMDGPU/swdev530052-illegal-vgpr-to-sgpr-copy-single.ll (+38)
  • (added) llvm/test/CodeGen/AMDGPU/swdev530052-illegal-vgpr-to-sgpr-copy-vector.ll (+42)
  • (modified) llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll (+18-63)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 85276bd24bcf4..00f48af5d9e22 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -875,7 +875,14 @@ void SIInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
     }
 
     if (!AMDGPU::SReg_32RegClass.contains(SrcReg)) {
-      reportIllegalCopy(this, MBB, MI, DL, DestReg, SrcReg, KillSrc);
+      // We invoke BuildMI() only when we have verified that the source register
+      // is a VGPR and the destination register is a SGPR, and since we cannot
+      // transfer data directly from VGPR to SGPR, we use
+      // AMDGPU::V_READFIRSTLANE_B32
+      assert(AMDGPU::SReg_32RegClass.contains(DestReg));
+      assert(AMDGPU::VGPR_32RegClass.contains(SrcReg));
+      BuildMI(MBB, MI, DL, this->get(AMDGPU::V_READFIRSTLANE_B32), DestReg)
+          .addReg(SrcReg);
       return;
     }
 
diff --git a/llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll b/llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill.ll
similarity index 72%
rename from llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll
rename to llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill.ll
index 34f4476f7fd6a..4b22b51cc7df2 100644
--- a/llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll
+++ b/llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill.ll
@@ -1,6 +1,4 @@
-; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 -filetype=null %s 2>&1 | FileCheck -enable-var-scope %s
-
-; CHECK: illegal VGPR to SGPR copy
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 < %s | FileCheck -enable-var-scope %s
 
 declare hidden void @external_void_func_a15i32_inreg([15 x i32] inreg) #0
 declare hidden void @external_void_func_a16i32_inreg([16 x i32] inreg) #0
@@ -25,3 +23,13 @@ attributes #0 = { nounwind }
 
 !llvm.module.flags = !{!0}
 !0 = !{i32 1, !"amdhsa_code_object_version", i32 400}
+; CHECK: v_readlane_b32
+; CHECK: s_mov_b32
+; CHECK: v_writelane_b32
+; CHECK: s_swappc_b64
+; CHECK: s_or_saveexec_b64
+; CHECK: buffer_load_dword
+; CHECK: s_waitcnt
+; CHECK: s_addk_i32
+; CHECK: v_readfirstlane_b32
+; CHECK: s_mov_b64
\ No newline at end of file
diff --git a/llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll b/llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll
index 597f90c0f4e84..28be32fabab3c 100644
--- a/llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll
+++ b/llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll
@@ -1,8 +1,6 @@
 ; RUN: not llc -mtriple=amdgcn -verify-machineinstrs=0 < %s 2>&1 | FileCheck -check-prefix=ERR %s
 ; RUN: not llc -mtriple=amdgcn -verify-machineinstrs=0 < %s 2>&1 | FileCheck -check-prefix=GCN %s
 
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v1 to s9
 
 define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_i32() #0 {
   %vgpr = call i32 asm sideeffect "; def $0", "=${v1}"()
@@ -42,9 +40,7 @@ define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_v16i32() #0 {
   ret void
 }
 
-; ERR: error: <unknown>:0:0: in function illegal_agpr_to_sgpr_copy_i32 void (): illegal VGPR to SGPR copy
-; GCN: v_accvgpr_read_b32 [[COPY1:v[0-9]+]], a1
-; GCN: ; illegal copy [[COPY1]] to s9
+
 define amdgpu_kernel void @illegal_agpr_to_sgpr_copy_i32() #1 {
   %agpr = call i32 asm sideeffect "; def $0", "=${a1}"()
   call void asm sideeffect "; use $0", "${s9}"(i32 %agpr)
@@ -54,7 +50,7 @@ define amdgpu_kernel void @illegal_agpr_to_sgpr_copy_i32() #1 {
 ; ERR: error: <unknown>:0:0: in function illegal_agpr_to_sgpr_copy_v2i32 void (): illegal VGPR to SGPR copy
 ; GCN-DAG: v_accvgpr_read_b32 v[[COPY1L:[0-9]+]], a0
 ; GCN-DAG: v_accvgpr_read_b32 v[[COPY1H:[0-9]+]], a1
-; GCN: ; illegal copy v[[[COPY1L]]:[[COPY1H]]] to s[10:11]
+; GCN: ; illegal copy v[0:1] to s[10:11]
 define amdgpu_kernel void @illegal_agpr_to_sgpr_copy_v2i32() #1 {
   %vgpr = call <2 x i32> asm sideeffect "; def $0", "=${a[0:1]}"()
   call void asm sideeffect "; use $0", "${s[10:11]}"(<2 x i32> %vgpr)
@@ -62,4 +58,4 @@ define amdgpu_kernel void @illegal_agpr_to_sgpr_copy_v2i32() #1 {
 }
 
 attributes #0 = { nounwind }
-attributes #1 = { nounwind "target-cpu"="gfx908" }
+attributes #1 = { nounwind "target-cpu"="gfx908" }
\ No newline at end of file
diff --git a/llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll b/llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll
index f0b3d334af67d..820d35de9e1e0 100644
--- a/llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll
+++ b/llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll
@@ -1,14 +1,107 @@
-; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs=0 -O0 2> %t.err < %s | FileCheck %s
-; RUN: FileCheck -check-prefix=ERR %s < %t.err
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs=0 -O0 2> %t.err < %s | FileCheck %s
 
 ; FIXME: This error will be fixed by supporting arbitrary divergent
 ; dynamic allocas by performing a wave umax of the size.
 
-; ERR: error: <unknown>:0:0: in function move_to_valu_assert_srd_is_physreg_swdev503538 i32 (ptr addrspace(1)): illegal VGPR to SGPR copy
-
-; CHECK: ; illegal copy v0 to s32
 
 define i32 @move_to_valu_assert_srd_is_physreg_swdev503538(ptr addrspace(1) %ptr) {
+; CHECK-LABEL: move_to_valu_assert_srd_is_physreg_swdev503538:
+; CHECK:       ; %bb.0: ; %entry
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    s_mov_b32 s7, s33
+; CHECK-NEXT:    s_mov_b32 s33, s32
+; CHECK-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; CHECK-NEXT:    buffer_store_dword v3, off, s[0:3], s33 ; 4-byte Folded Spill
+; CHECK-NEXT:    s_mov_b64 exec, s[4:5]
+; CHECK-NEXT:    s_add_i32 s32, s32, 0x400
+; CHECK-NEXT:    v_mov_b32_e32 v2, v1
+; CHECK-NEXT:    ; implicit-def: $sgpr4
+; CHECK-NEXT:    ; implicit-def: $sgpr4
+; CHECK-NEXT:    ; kill: def $vgpr0 killed $vgpr0 def $vgpr0_vgpr1 killed $exec
+; CHECK-NEXT:    ; kill: def $vgpr1 killed $vgpr2 killed $exec
+; CHECK-NEXT:    ; implicit-def: $sgpr4_sgpr5
+; CHECK-NEXT:    v_mov_b32_e32 v0, s32
+; CHECK-NEXT:    v_accvgpr_write_b32 a0, v0 ; Reload Reuse
+; CHECK-NEXT:    v_readfirstlane_b32 s32, v0
+; CHECK-NEXT:    v_accvgpr_write_b32 a1, v0 ; Reload Reuse
+; CHECK-NEXT:    ; implicit-def: $sgpr4
+; CHECK-NEXT:    s_mov_b64 s[4:5], exec
+; CHECK-NEXT:    ; implicit-def: $vgpr3 : SGPR spill to VGPR lane
+; CHECK-NEXT:    v_writelane_b32 v3, s4, 0
+; CHECK-NEXT:    v_writelane_b32 v3, s5, 1
+; CHECK-NEXT:    s_or_saveexec_b64 s[10:11], -1
+; CHECK-NEXT:    v_accvgpr_write_b32 a2, v3 ; Reload Reuse
+; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:  .LBB0_1: ; =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    s_or_saveexec_b64 s[10:11], -1
+; CHECK-NEXT:    v_accvgpr_read_b32 v3, a2 ; Reload Reuse
+; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:    v_accvgpr_read_b32 v0, a0 ; Reload Reuse
+; CHECK-NEXT:    v_readfirstlane_b32 s4, v0
+; CHECK-NEXT:    v_writelane_b32 v3, s4, 2
+; CHECK-NEXT:    v_cmp_eq_u32_e64 s[4:5], s4, v0
+; CHECK-NEXT:    s_and_saveexec_b64 s[4:5], s[4:5]
+; CHECK-NEXT:    v_writelane_b32 v3, s4, 3
+; CHECK-NEXT:    v_writelane_b32 v3, s5, 4
+; CHECK-NEXT:    s_or_saveexec_b64 s[10:11], -1
+; CHECK-NEXT:    v_accvgpr_write_b32 a2, v3 ; Reload Reuse
+; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:  ; %bb.2: ; in Loop: Header=BB0_1 Depth=1
+; CHECK-NEXT:    s_or_saveexec_b64 s[10:11], -1
+; CHECK-NEXT:    v_accvgpr_read_b32 v3, a2 ; Reload Reuse
+; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:    v_readlane_b32 s4, v3, 3
+; CHECK-NEXT:    v_readlane_b32 s5, v3, 4
+; CHECK-NEXT:    v_readlane_b32 s6, v3, 2
+; CHECK-NEXT:    s_nop 4
+; CHECK-NEXT:    buffer_load_dword v0, off, s[0:3], s6
+; CHECK-NEXT:    s_waitcnt vmcnt(0)
+; CHECK-NEXT:    v_accvgpr_write_b32 a3, v0 ; Reload Reuse
+; CHECK-NEXT:    s_xor_b64 exec, exec, s[4:5]
+; CHECK-NEXT:    s_cbranch_execnz .LBB0_1
+; CHECK-NEXT:  ; %bb.3:
+; CHECK-NEXT:    s_or_saveexec_b64 s[10:11], -1
+; CHECK-NEXT:    v_accvgpr_read_b32 v3, a2 ; Reload Reuse
+; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:    v_readlane_b32 s4, v3, 0
+; CHECK-NEXT:    v_readlane_b32 s5, v3, 1
+; CHECK-NEXT:    s_mov_b64 exec, s[4:5]
+; CHECK-NEXT:    s_mov_b32 s4, 0
+; CHECK-NEXT:    v_writelane_b32 v3, s4, 5
+; CHECK-NEXT:    s_or_saveexec_b64 s[10:11], -1
+; CHECK-NEXT:    v_accvgpr_write_b32 a2, v3 ; Reload Reuse
+; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:  .LBB0_4: ; %loadstoreloop
+; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    s_or_saveexec_b64 s[10:11], -1
+; CHECK-NEXT:    v_accvgpr_read_b32 v3, a2 ; Reload Reuse
+; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:    v_readlane_b32 s4, v3, 5
+; CHECK-NEXT:    v_accvgpr_read_b32 v0, a1 ; Reload Reuse
+; CHECK-NEXT:    v_add_u32_e64 v1, v0, s4
+; CHECK-NEXT:    v_mov_b32_e32 v0, 0
+; CHECK-NEXT:    buffer_store_byte v0, v1, s[0:3], 0 offen
+; CHECK-NEXT:    s_mov_b32 s5, 1
+; CHECK-NEXT:    s_add_i32 s4, s4, s5
+; CHECK-NEXT:    s_mov_b32 s5, 0x800
+; CHECK-NEXT:    s_cmp_lt_u32 s4, s5
+; CHECK-NEXT:    v_writelane_b32 v3, s4, 5
+; CHECK-NEXT:    s_mov_b64 s[10:11], exec
+; CHECK-NEXT:    s_mov_b64 exec, -1
+; CHECK-NEXT:    v_accvgpr_write_b32 a2, v3 ; Reload Reuse
+; CHECK-NEXT:    s_mov_b64 exec, s[10:11]
+; CHECK-NEXT:    s_cbranch_scc1 .LBB0_4
+; CHECK-NEXT:  ; %bb.5: ; %Flow
+; CHECK-NEXT:  ; %bb.6: ; %split
+; CHECK-NEXT:    v_accvgpr_read_b32 v0, a3 ; Reload Reuse
+; CHECK-NEXT:    s_mov_b32 s32, s33
+; CHECK-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; CHECK-NEXT:    buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
+; CHECK-NEXT:    s_mov_b64 exec, s[4:5]
+; CHECK-NEXT:    s_mov_b32 s33, s7
+; CHECK-NEXT:    s_waitcnt vmcnt(0)
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
 entry:
   %idx = load i32, ptr addrspace(1) %ptr, align 4
   %zero = extractelement <4 x i32> zeroinitializer, i32 %idx
diff --git a/llvm/test/CodeGen/AMDGPU/swdev530052-illegal-vgpr-to-sgpr-copy-single.ll b/llvm/test/CodeGen/AMDGPU/swdev530052-illegal-vgpr-to-sgpr-copy-single.ll
new file mode 100644
index 0000000000000..0d898bd5cd8c3
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/swdev530052-illegal-vgpr-to-sgpr-copy-single.ll
@@ -0,0 +1,38 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 < %s | FileCheck %s
+
+declare void @test_buffer_load_sgpr_plus_imm_offset_noflags(i32 inreg)
+
+define void @test_load_zext(i32 %foo) {
+; CHECK-LABEL: test_load_zext:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    s_mov_b32 s0, s33
+; CHECK-NEXT:    s_mov_b32 s33, s32
+; CHECK-NEXT:    s_or_saveexec_b64 s[2:3], -1
+; CHECK-NEXT:    scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
+; CHECK-NEXT:    s_mov_b64 exec, s[2:3]
+; CHECK-NEXT:    s_add_i32 s32, s32, 16
+; CHECK-NEXT:    v_writelane_b32 v40, s0, 2
+; CHECK-NEXT:    s_getpc_b64 s[0:1]
+; CHECK-NEXT:    s_add_u32 s0, s0, test_buffer_load_sgpr_plus_imm_offset_noflags@gotpcrel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s1, s1, test_buffer_load_sgpr_plus_imm_offset_noflags@gotpcrel32@hi+12
+; CHECK-NEXT:    s_load_dwordx2 s[2:3], s[0:1], 0x0
+; CHECK-NEXT:    v_writelane_b32 v40, s30, 0
+; CHECK-NEXT:    v_readfirstlane_b32 s0, v0
+; CHECK-NEXT:    v_writelane_b32 v40, s31, 1
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[2:3]
+; CHECK-NEXT:    v_readlane_b32 s31, v40, 1
+; CHECK-NEXT:    v_readlane_b32 s30, v40, 0
+; CHECK-NEXT:    s_mov_b32 s32, s33
+; CHECK-NEXT:    v_readlane_b32 s0, v40, 2
+; CHECK-NEXT:    s_or_saveexec_b64 s[2:3], -1
+; CHECK-NEXT:    scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
+; CHECK-NEXT:    s_mov_b64 exec, s[2:3]
+; CHECK-NEXT:    s_mov_b32 s33, s0
+; CHECK-NEXT:    s_waitcnt vmcnt(0)
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void @test_buffer_load_sgpr_plus_imm_offset_noflags(i32 %foo)
+  ret void
+ }
diff --git a/llvm/test/CodeGen/AMDGPU/swdev530052-illegal-vgpr-to-sgpr-copy-vector.ll b/llvm/test/CodeGen/AMDGPU/swdev530052-illegal-vgpr-to-sgpr-copy-vector.ll
new file mode 100644
index 0000000000000..63ecba2d56451
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/swdev530052-illegal-vgpr-to-sgpr-copy-vector.ll
@@ -0,0 +1,42 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx942 < %s | FileCheck %s
+
+define void @test_load_zext(<4 x i32> %LGV) {
+; CHECK-LABEL: test_load_zext:
+; CHECK:       ; %bb.0: ; %.entry
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    s_mov_b32 s0, s33
+; CHECK-NEXT:    s_mov_b32 s33, s32
+; CHECK-NEXT:    s_or_saveexec_b64 s[2:3], -1
+; CHECK-NEXT:    scratch_store_dword off, v40, s33 ; 4-byte Folded Spill
+; CHECK-NEXT:    s_mov_b64 exec, s[2:3]
+; CHECK-NEXT:    s_add_i32 s32, s32, 16
+; CHECK-NEXT:    v_writelane_b32 v40, s0, 2
+; CHECK-NEXT:    s_getpc_b64 s[0:1]
+; CHECK-NEXT:    s_add_u32 s0, s0, test_buffer_load_sgpr_plus_imm_offset_noflags@gotpcrel32@lo+4
+; CHECK-NEXT:    s_addc_u32 s1, s1, test_buffer_load_sgpr_plus_imm_offset_noflags@gotpcrel32@hi+12
+; CHECK-NEXT:    s_load_dwordx2 s[16:17], s[0:1], 0x0
+; CHECK-NEXT:    v_writelane_b32 v40, s30, 0
+; CHECK-NEXT:    v_readfirstlane_b32 s0, v0
+; CHECK-NEXT:    v_readfirstlane_b32 s1, v1
+; CHECK-NEXT:    v_readfirstlane_b32 s2, v2
+; CHECK-NEXT:    v_readfirstlane_b32 s3, v3
+; CHECK-NEXT:    v_writelane_b32 v40, s31, 1
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    s_swappc_b64 s[30:31], s[16:17]
+; CHECK-NEXT:    v_readlane_b32 s31, v40, 1
+; CHECK-NEXT:    v_readlane_b32 s30, v40, 0
+; CHECK-NEXT:    s_mov_b32 s32, s33
+; CHECK-NEXT:    v_readlane_b32 s0, v40, 2
+; CHECK-NEXT:    s_or_saveexec_b64 s[2:3], -1
+; CHECK-NEXT:    scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
+; CHECK-NEXT:    s_mov_b64 exec, s[2:3]
+; CHECK-NEXT:    s_mov_b32 s33, s0
+; CHECK-NEXT:    s_waitcnt vmcnt(0)
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+.entry:
+  call void @test_buffer_load_sgpr_plus_imm_offset_noflags(<4 x i32> %LGV)
+  ret void
+}
+
+declare void @test_buffer_load_sgpr_plus_imm_offset_noflags(<4 x i32> inreg)
diff --git a/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll b/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
index 242b5e9aeaf42..cf3e61e58bdcc 100644
--- a/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
+++ b/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
@@ -1,41 +1,8 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 2> %t.err < %s | FileCheck %s
-; RUN: FileCheck -check-prefix=ERR %s < %t.err
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 < %s | FileCheck %s
 ; FIXME: These tests cannot be tail called, and should be executed in a waterfall loop.
 
 declare hidden void @void_func_i32_inreg(i32 inreg)
-
-; ERR: error: <unknown>:0:0: in function tail_call_i32_inreg_divergent void (i32): illegal VGPR to SGPR copy
-; ERR: error: <unknown>:0:0: in function indirect_tail_call_i32_inreg_divergent void (i32): illegal VGPR to SGPR copy
-
 define void @tail_call_i32_inreg_divergent(i32 %vgpr) {
-; CHECK-LABEL: tail_call_i32_inreg_divergent:
-; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT:    s_mov_b32 s16, s33
-; CHECK-NEXT:    s_mov_b32 s33, s32
-; CHECK-NEXT:    s_or_saveexec_b64 s[18:19], -1
-; CHECK-NEXT:    buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
-; CHECK-NEXT:    s_mov_b64 exec, s[18:19]
-; CHECK-NEXT:    v_writelane_b32 v40, s16, 2
-; CHECK-NEXT:    s_addk_i32 s32, 0x400
-; CHECK-NEXT:    v_writelane_b32 v40, s30, 0
-; CHECK-NEXT:    v_writelane_b32 v40, s31, 1
-; CHECK-NEXT:    s_getpc_b64 s[16:17]
-; CHECK-NEXT:    s_add_u32 s16, s16, void_func_i32_inreg@rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s17, s17, void_func_i32_inreg@rel32@hi+12
-; CHECK-NEXT:     ; illegal copy v0 to s0
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[16:17]
-; CHECK-NEXT:    v_readlane_b32 s31, v40, 1
-; CHECK-NEXT:    v_readlane_b32 s30, v40, 0
-; CHECK-NEXT:    s_mov_b32 s32, s33
-; CHECK-NEXT:    v_readlane_b32 s4, v40, 2
-; CHECK-NEXT:    s_or_saveexec_b64 s[6:7], -1
-; CHECK-NEXT:    buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
-; CHECK-NEXT:    s_mov_b64 exec, s[6:7]
-; CHECK-NEXT:    s_mov_b32 s33, s4
-; CHECK-NEXT:    s_waitcnt vmcnt(0)
-; CHECK-NEXT:    s_setpc_b64 s[30:31]
   tail call void @void_func_i32_inreg(i32 inreg %vgpr)
   ret void
 }
@@ -43,36 +10,24 @@ define void @tail_call_i32_inreg_divergent(i32 %vgpr) {
 @constant = external hidden addrspace(4) constant ptr
 
 define void @indirect_tail_call_i32_inreg_divergent(i32 %vgpr) {
-; CHECK-LABEL: indirect_tail_call_i32_inreg_divergent:
-; CHECK:       ; %bb.0:
-; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT:    s_mov_b32 s16, s33
-; CHECK-NEXT:    s_mov_b32 s33, s32
-; CHECK-NEXT:    s_or_saveexec_b64 s[18:19], -1
-; CHECK-NEXT:    buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
-; CHECK-NEXT:    s_mov_b64 exec, s[18:19]
-; CHECK-NEXT:    s_addk_i32 s32, 0x400
-; CHECK-NEXT:    v_writelane_b32 v40, s16, 2
-; CHECK-NEXT:    s_getpc_b64 s[16:17]
-; CHECK-NEXT:    s_add_u32 s16, s16, constant@rel32@lo+4
-; CHECK-NEXT:    s_addc_u32 s17, s17, constant@rel32@hi+12
-; CHECK-NEXT:    s_load_dwordx2 s[16:17], s[16:17], 0x0
-; CHECK-NEXT:    v_writelane_b32 v40, s30, 0
-; CHECK-NEXT:    v_writelane_b32 v40, s31, 1
-; CHECK-NEXT:     ; illegal copy v0 to s0
-; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
-; CHECK-NEXT:    s_swappc_b64 s[30:31], s[16:17]
-; CHECK-NEXT:    v_readlane_b32 s31, v40, 1
-; CHECK-NEXT:    v_readlane_b32 s30, v40, 0
-; CHECK-NEXT:    s_mov_b32 s32, s33
-; CHECK-NEXT:    v_readlane_b32 s4, v40, 2
-; CHECK-NEXT:    s_or_saveexec_b64 s[6:7], -1
-; CHECK-NEXT:    buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
-; CHECK-NEXT:    s_mov_b64 exec, s[6:7]
-; CHECK-NEXT:    s_mov_b32 s33, s4
-; CHECK-NEXT:    s_waitcnt vmcnt(0)
-; CHECK-NEXT:    s_setpc_b64 s[30:31]
   %fptr = load ptr, ptr addrspace(4) @constant, align 8
   tail call void %fptr(i32 inreg %vgpr)
   ret void
 }
+;CHECK: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
+;CHECK: s_mov_b64 exec, s[18:19]
+;CHECK:	v_writelane_b32 v40, s16, 2
+;CHECK:	s_addk_i32 s32, 0x400
+;CHECK:	v_writelane_b32 v40, s30, 0
+;CHECK:	s_getpc_b64 s[16:17]
+;CHECK:	s_add_u32 s16, s16, void_func_i32_inreg@rel32@lo+4
+;CHECK:	s_addc_u32 s17, s17, void_func_i32_inreg@rel32@hi+12
+;CHECK:	v_readfirstlane_b32 s0, v0
+;CHECK:	v_writelane_b32 v40, s31, 1
+;CHECK:	s_swappc_b64 s[30:31], s[16:17]
+;CHECK:	v_readlane_b32 s31, v40, 1
+;CHECK:	v_readlane_b32 s30, v40, 0
+;CHECK:	s_mov_b32 s32, s33
+;CHECK:	v_readlane_b32 s4, v40, 2
+;CHECK:	s_or_saveexec_b64 s[6:7], -1
+;CHECK:	buffer_load_dword v40, off, s[0:3], s33
\ No newline at end of file

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just hacking around the issue at the failure point. You cannot meaningful do anything with a physical register copy. This needs to be a semantic, context dependent choice at the point the copy originated. e.g. if this was an SGPR call argument, we either needed to insert the readfirstlane or do a waterfall loop

; CHECK: s_waitcnt
; CHECK: s_addk_i32
; CHECK: v_readfirstlane_b32
; CHECK: s_mov_b64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; CHECK: s_mov_b64
; CHECK: s_mov_b64

Missing newline error

; ERR: error: <unknown>:0:0: in function illegal_agpr_to_sgpr_copy_i32 void (): illegal VGPR to SGPR copy
; GCN: v_accvgpr_read_b32 [[COPY1:v[0-9]+]], a1
; GCN: ; illegal copy [[COPY1]] to s9

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be left with no checks

define amdgpu_kernel void @illegal_agpr_to_sgpr_copy_v2i32() #1 {
%vgpr = call <2 x i32> asm sideeffect "; def $0", "=${a[0:1]}"()
call void asm sideeffect "; use $0", "${s[10:11]}"(<2 x i32> %vgpr)
ret void
}

attributes #0 = { nounwind }
attributes #1 = { nounwind "target-cpu"="gfx908" }
attributes #1 = { nounwind "target-cpu"="gfx908" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline error

@arsenm
Copy link
Contributor

arsenm commented Jun 13, 2025

Title also needs to describe when and what it's doing, not just state an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants