Skip to content

[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel #12591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 101 commits into from
Apr 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
2702132
Add kernel that supports variable sequence length
rasmith Jan 30, 2025
cb79d54
isort
rasmith Jan 30, 2025
f346cd2
codespell
rasmith Jan 30, 2025
0ffdfae
ruff
rasmith Jan 30, 2025
71a072e
ruff/codespell
rasmith Jan 30, 2025
1f3729d
ruff
rasmith Jan 31, 2025
d6652ac
codespell
rasmith Jan 31, 2025
8f702c8
ruff
rasmith Jan 31, 2025
1f718b2
ruff
rasmith Jan 31, 2025
f9f2aba
ruff
rasmith Jan 31, 2025
107a7a5
ruff
rasmith Jan 31, 2025
ecb3320
ruff
rasmith Jan 31, 2025
4ef102d
Merge branch 'vllm-project:main' into ransmith_triton_fav2_vsl
rasmith Jan 31, 2025
71f89c5
yapf
rasmith Jan 31, 2025
94258b0
resolve merge
rasmith Feb 7, 2025
239f1d1
unit tests work
rasmith Feb 8, 2025
fc903c3
remove bwd
rasmith Feb 8, 2025
1780946
reformatting
rasmith Feb 8, 2025
aaa0d9e
reformatting
rasmith Feb 8, 2025
9e0d8ce
reformatting
rasmith Feb 8, 2025
9a2afda
reformatting
rasmith Feb 8, 2025
2161514
reformatting
rasmith Feb 8, 2025
1366205
ruff
rasmith Feb 8, 2025
3ed0e91
ruff
rasmith Feb 8, 2025
8b66ddc
ruff
rasmith Feb 8, 2025
bd3b2c7
ruff
rasmith Feb 8, 2025
dd63b79
add unit tests
rasmith Feb 8, 2025
6350ed7
everything seems to work
rasmith Feb 8, 2025
5eff339
codespell
rasmith Feb 10, 2025
0727349
fix incorrect function call
rasmith Feb 11, 2025
f6b2001
add spdx identifier
rasmith Feb 11, 2025
f1a29d3
Merge branch 'vllm-project:main' into ransmith_triton_fav2_vsl
rasmith Feb 11, 2025
5f478de
Merge branch 'vllm-project:main' into ransmith_triton_fav2_vsl
rasmith Feb 12, 2025
65ff486
update unit tests
rasmith Feb 13, 2025
95d1795
Try using newer fa kernel
rasmith Feb 22, 2025
e8f1ed7
Merge branch 'main' into ransmith_triton_fav2_vsl
rasmith Feb 22, 2025
44eb67e
revert back to current triton fa
rasmith Feb 22, 2025
e8e6fef
use older triton fa
rasmith Feb 22, 2025
c370096
Merge branch 'vllm-project:main' into ransmith_triton_fav2_vsl
rasmith Mar 13, 2025
3423c44
Update unit tests, make it work with fp8 llama for ROCm
rasmith Mar 13, 2025
301811b
yapf
rasmith Mar 13, 2025
eda000c
isort
rasmith Mar 13, 2025
a3ca4f1
ruff/yapf
rasmith Mar 13, 2025
90022be
remove main from test file
rasmith Mar 13, 2025
67adb34
mypy
rasmith Mar 13, 2025
3b5ba1a
mypy
rasmith Mar 13, 2025
b8ba91b
isort
rasmith Mar 13, 2025
4bfacad
don't use fp8_out_scale parameter in abstract
rasmith Mar 13, 2025
b911e3f
ruff
rasmith Mar 13, 2025
ae7f6c6
remove cpa if statement
rasmith Mar 14, 2025
345ec5d
eight bit dtypes
rasmith Mar 14, 2025
5c399ea
eight bit dtype
rasmith Mar 14, 2025
787eb33
eight bit dtype
rasmith Mar 14, 2025
d39aee9
init fp8_out_scale in layer.py
rasmith Mar 14, 2025
b22250f
remove parameter
rasmith Mar 14, 2025
9508130
update autotune
rasmith Mar 14, 2025
3a7048c
merge main
rasmith Mar 21, 2025
5e4a79f
remove variable redefinitions
rasmith Mar 21, 2025
ea67811
add back in vscale float
rasmith Mar 21, 2025
6364b73
Merge branch 'vllm-project:main' into ransmith_triton_fav2_vsl
rasmith Mar 21, 2025
62c2efb
merge main
rasmith Mar 28, 2025
fb94876
remove extra q_scale and is_navi
rasmith Mar 28, 2025
f15c554
add booloean
rasmith Mar 28, 2025
94ac0b1
use kv_cache_dtype
rasmith Mar 28, 2025
2534cef
remove variable
rasmith Mar 28, 2025
9742688
rename _fp8_out_scale
rasmith Mar 28, 2025
49a83ba
remove fp8 integration and break into separate PR
rasmith Mar 28, 2025
ca149e3
clean up unit tests and triton kernel
rasmith Apr 6, 2025
77135a4
ruff
rasmith Apr 6, 2025
57f870a
output fp16 from triton FA, update test for o_scale, accept only tens…
rasmith Apr 8, 2025
e00e910
overwrote wrong file
rasmith Apr 8, 2025
314220d
Merge branch 'main' into ransmith_triton_fav2_vsl
rasmith Apr 8, 2025
25f302f
add is_rocm_rdna and is_rocm_cdna methods
rasmith Apr 8, 2025
bd62156
move has_rdna_target/has_cdna_target impl to rocm.py
rasmith Apr 8, 2025
080d9f9
use tl.constexpr in kernel instead of passing
rasmith Apr 8, 2025
d8a4f2c
use scaled_fp8_quant and simplify unit tests more
rasmith Apr 9, 2025
cb535be
add subroutine to quantize fp8 triton tensors
rasmith Apr 9, 2025
7deeb60
remove rocm_cdna and rename bool vars
rasmith Apr 10, 2025
33bc577
remove is_rdna_target
rasmith Apr 10, 2025
d3f08d3
persistent->is_persistent
rasmith Apr 10, 2025
9344284
remove dropout
rasmith Apr 17, 2025
25b44fb
remove unnecessary params
rasmith Apr 17, 2025
3b7d410
move the cdna check back into triton kernel
rasmith Apr 17, 2025
0918625
remove bhsd from some tests and add more layout choices and annotated…
rasmith Apr 17, 2025
da06687
refactor metadata __init__ and use in tests
rasmith Apr 18, 2025
7f4e4d6
remove has_cdna_target from interface.py
rasmith Apr 22, 2025
f396453
handle cases where integer overflow could occur
rasmith Apr 22, 2025
acd8e56
remove comment
rasmith Apr 22, 2025
f798230
revert to see if segv goes away
rasmith Apr 23, 2025
e5bc500
try avoiding overflow while also avoiding crashing nvidia triton comp…
rasmith Apr 24, 2025
db52071
make bools constexpr so Triton compiler doesn't crash
rasmith Apr 24, 2025
49ff9d2
remove persistent
rasmith Apr 24, 2025
500d6a8
adjust tolerances for unit tests
rasmith Apr 24, 2025
0b72647
requires grad and adjust o_scale test param
rasmith Apr 24, 2025
d4186cf
Merge branch 'vllm-project:main' into ransmith_triton_fav2_vsl
rasmith Apr 25, 2025
6e82d42
use triton.language.constexpr correctly and add check for cuda 9.0 fo…
rasmith Apr 26, 2025
f09db8f
Merge branch 'ransmith_triton_fav2_vsl' of github.com:rasmith/vllm in…
rasmith Apr 26, 2025
a5c3ad5
Merge branch 'main' into ransmith_triton_fav2_vsl
rasmith Apr 26, 2025
67e44d7
Merge branch 'main' into ransmith_triton_fav2_vsl
rasmith Apr 26, 2025
dd66506
Merge branch 'vllm-project:main' into ransmith_triton_fav2_vsl
rasmith Apr 26, 2025
7315172
Merge branch 'vllm-project:main' into ransmith_triton_fav2_vsl
rasmith Apr 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading