Skip to content

[Release 2.6] Triton/inductor related optimisations #2008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: release/2.6
Choose a base branch
from

Conversation

jataylo
Copy link

@jataylo jataylo commented Mar 27, 2025

  • Update triton pin to perf optimised commit
  • Add experimental exhaustive tuning mode for flex_attention enabled via TORCHINDUCTOR_EXHAUSTIVE_FLEX_ATTENTION_EXPERIMENTAL
  • Add AMD triton backend specific kernarg support for waves_per_eu and kpack
  • Bug fix flex attention issue when num_warps=8

jataylo and others added 5 commits March 13, 2025 11:23
…8437)

Splitting pytorch#147315 into two PRs. This PR adds general support for kpack and waves_per_eu triton kernel args for AMD backend. More detail in the PR above.

A follow up PR will update the configs used by ROCm but this requires pytorch#147452 to land first

Pull Request resolved: pytorch#148437
Approved by: https://github.com/eellison, https://github.com/jansel

(cherry picked from commit 8059ead)
@jataylo
Copy link
Author

jataylo commented Mar 27, 2025

I'd like internal CI to kick off before merging, hopefully its working.

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Mar 27, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Mar 27, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@jataylo jataylo changed the title Triton/inductor related optimisations [Release 2.6] Triton/inductor related optimisations Mar 31, 2025
@jataylo
Copy link
Author

jataylo commented Mar 31, 2025

@jithunnair-amd @pruthvistony local testing is good for inductor subset can we merge this?

Looks like pytorch_inductor failed for this job but its not obvious if this was already failing..

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 1, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 2, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7345/7886] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/torch/csrc/cuda/memory_snapshot.cpp.o
cc1plus: warning: command-line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
[7346/7886] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/Sorting.cpp.o
cc1plus: warning: command-line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
[7347/7886] Building CXX object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/Blas.cpp.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/Blas.cpp.o 
/opt/cache/bin/sccache /opt/cache/bin/c++ -DAT_PER_OPERATOR_HEADERS -DFLASHATTENTION_DISABLE_ALIBI -DFMT_HEADER_ONLY=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DPYTORCH_LAYERNORM_FAST_RECIPROCAL -DROCM_VERSION=60304 -DTORCH_ENABLE_LLVM -DTORCH_HIP_BUILD_MAIN_LIB -DTORCH_HIP_VERSION=603 -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_MEM_EFF_ATTENTION -DUSE_NCCL -DUSE_PROF_API=1 -DUSE_ROCM -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -D__HIP_PLATFORM_AMD__ -D__HIP_PLATFORM_AMD__=1 -Dtorch_hip_EXPORTS -I/var/lib/jenkins/pytorch/build/aten/src -I/var/lib/jenkins/pytorch/aten/src -I/var/lib/jenkins/pytorch/build -I/var/lib/jenkins/pytorch -I/var/lib/jenkins/pytorch/cmake/../third_party/benchmark/include -I/opt/llvm/include -I/var/lib/jenkins/pytorch/third_party/onnx -I/var/lib/jenkins/pytorch/build/third_party/onnx -I/var/lib/jenkins/pytorch/nlohmann -I/opt/rocm/hcc/include -I/opt/rocm/rocblas/include -I/opt/rocm/hipsparse/include -I/opt/rocm/include/rccl -I/var/lib/jenkins/pytorch/aten/src/THH -I/var/lib/jenkins/pytorch/aten/src/ATen/hip -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/include -I/var/lib/jenkins/pytorch/aten/src/ATen/../../../third_party/composable_kernel/library/include -I/var/lib/jenkins/pytorch/third_party/fmt/include -I/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck -I/var/lib/jenkins/pytorch/build/caffe2/aten/src -I/var/lib/jenkins/pytorch/aten/src/ATen/.. -I/var/lib/jenkins/pytorch/torch/include -I/var/lib/jenkins/pytorch/c10/hip/../.. -I/var/lib/jenkins/pytorch/c10/.. -I/var/lib/jenkins/pytorch/torch/csrc/api -I/var/lib/jenkins/pytorch/torch/csrc/api/include -I/var/lib/jenkins/pytorch/build/third_party/gloo/hip -isystem /opt/rocm-6.3.4/include -isystem /var/lib/jenkins/pytorch/build/third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -isystem /var/lib/jenkins/pytorch/third_party/protobuf/src -isystem /opt/conda/envs/py_3.10/include -isystem /var/lib/jenkins/pytorch/third_party/XNNPACK/include -isystem /var/lib/jenkins/pytorch/third_party/ittapi/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/eigen -isystem /var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /var/lib/jenkins/pytorch/third_party/ideep/include -isystem /var/lib/jenkins/pytorch/INTERFACE -isystem /var/lib/jenkins/pytorch/third_party/nlohmann/include -isystem /opt/rocm/include -isystem /opt/rocm-6.3.4/include/hiprand -isystem /opt/rocm-6.3.4/include/rocrand -isystem /opt/rocm/magma/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -DMKL_HAS_SBGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-missing-field-initializers -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wunused-function -Wunused-variable -Wunused-but-set-variable -Wno-maybe-uninitialized -fvisibility=hidden -O2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DCUDA_HAS_FP16=1 -DUSE_ROCM -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -DTORCH_HIP_VERSION=603 -Wno-shift-count-negative -Wno-shift-count-overflow -Wno-duplicate-decl-specifier -DCAFFE2_USE_MIOPEN -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP -std=c++17 -DHIPBLAS_V2 -DHIPBLASLT_VEC_EXT -D_GLIBCXX_USE_CXX11_ABI=1 -DHIP_ENABLE_WARP_SYNC_BUILTINS -DHIP_VERSION=6 -DUSE_MIOPEN -MD -MT caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/Blas.cpp.o -MF caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/Blas.cpp.o.d -o caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/Blas.cpp.o -c /var/lib/jenkins/pytorch/aten/src/ATen/native/hip/Blas.cpp
cc1plus: warning: command-line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/hip/Blas.cpp:15:
/var/lib/jenkins/pytorch/aten/src/ATen/hip/tunable/TunableGemm.h:25:10: fatal error: c10/util/Float8_e8m0fnu.h: No such file or directory
   25 | #include <c10/util/Float8_e8m0fnu.h>

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 3, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 3, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 10, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 10, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@jithunnair-amd
Copy link
Collaborator

!ci-build
NODE_LABEL="(gfx90a) && !hyd-7c"

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 11, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as ABORTED
Links: Blue Ocean view / Build artifacts

@jithunnair-amd
Copy link
Collaborator

!ci-build
NODE_LABEL=(gfx90a) && !hyd-7c

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 11, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 11, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 16, 2025

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit is in progress
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

Jenkins build for 8bf46b07ecdd31aea1641a30998e0ac00699ef4a commit is in progress
Links: Blue Ocean view / Build artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants