You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I could not replicate the scenario through simple test script. It is a bug found in vLLM ROCm when running meta-llama/Llama-4-Scout-17B-16E-Instruct. This behaviour only occurs in HIPGraph mode + torch.compile. In EAGER mode + torch.compile, the contiguous() API and stride() API are consistent.
In the HIPGraph, it could occur that the tensor A has the following properties: .shape: ([1024, 1]) .is_contiguous(): True .stride() : [1,1024] .is_contiguous(memory_format=torch.channels_last) is False .is_contiguous(memory_format=torch.contiguous_format) is True
Expected behaviour is .stride(1) == 1 is True as is_contiguous() is True.
This A = A.contiguous() do not fix the issue. It is still .stride() : [1,1024].
To fix this bug, the workaround right now is A = A.view(-1).reshape(A.shape).
On CUDA, the stride returns (1,1) but on ROCm, it returns (1,1024)
Versions
Collecting environment information...
PyTorch version: 2.7.0a0+git295f2ed
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 6.3.42133-1b9c17779
OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.3.1 24491 1e0fda770a2079fbd71e4b70974d74f62fd3af10)
CMake version: version 3.31.6
Libc version: glibc-2.35
Python version: 3.12.9 (main, Feb 5 2025, 08:49:00) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-116-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Instinct MI300X (gfx942:sramecc+:xnack-)
Nvidia driver version: Could not collect cuDNN version: Could not collect
HIP runtime version: 6.3.42133
MIOpen runtime version: 3.3.0
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.7.0a0+git295f2ed
[pip3] torchvision==0.21.0+7af6987
[pip3] triton==3.2.0+gite5be006a
The text was updated successfully, but these errors were encountered:
tjtanaa
changed the title
[Bug]: Inconsistency between is_contiguous and Stride API in HIPGRAPH
[Bug]: Inconsistency between is_contiguous and stride API in HIPGRAPH
Apr 7, 2025
🐛 Describe the bug
I could not replicate the scenario through simple test script. It is a bug found in vLLM ROCm when running
meta-llama/Llama-4-Scout-17B-16E-Instruct
. This behaviour only occurs inHIPGraph mode + torch.compile
. InEAGER mode + torch.compile
, the contiguous() API and stride() API are consistent.In the HIPGraph, it could occur that the tensor
A
has the following properties:.shape
:([1024, 1])
.is_contiguous()
:True
.stride()
:[1,1024]
.is_contiguous(memory_format=torch.channels_last)
isFalse
.is_contiguous(memory_format=torch.contiguous_format)
isTrue
Expected behaviour is
.stride(1) == 1
isTrue
asis_contiguous()
isTrue
.This
A = A.contiguous()
do not fix the issue. It is still.stride()
:[1,1024]
.To fix this bug, the workaround right now is
A = A.view(-1).reshape(A.shape)
.On CUDA, the stride returns
(1,1)
but on ROCm, it returns(1,1024)
Versions
The text was updated successfully, but these errors were encountered: