Skip to content

chore: Dockerfile-cuda - Retain major CC when pruning static cuBLAS lib #635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

polarathene
Copy link

What does this PR do?

Pruning cuBLAS for CC 7.5 now also retains sm_70 in addition to the sm_75 target. See #610 (comment) for more information.

@polarathene
Copy link
Author

polarathene commented Jun 13, 2025

NOTE: There is no known need to do this for TEI, however Nvidia encourages retaining the major CC and any minors in-between when using nvprune on cuBLAS.


Feel free to close the PR if you prefer to avoid until there's a relevant bug report. My understanding is it should only be an issue when using a kernel from cuBLAS that would defer to sm_70 when it'd have been equivalent for sm_75.

For example in the current base image used to build, sm_70 has 184 cubins vs sm_75 containing only 8:

$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -oE '\.sm_70.*\.' | wc -l
184

$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -oE '\.sm_75.*\.' | wc -l
8

# Individual cubins:
$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -E '\.sm_75.*\.'
ELF file    5: libcublas_static.5.sm_75.cubin
ELF file   13: libcublas_static.13.sm_75.cubin
ELF file   21: libcublas_static.21.sm_75.cubin
ELF file   29: libcublas_static.29.sm_75.cubin
ELF file   37: libcublas_static.37.sm_75.cubin
ELF file   45: libcublas_static.45.sm_75.cubin
ELF file   53: libcublas_static.53.sm_75.cubin
ELF file   61: libcublas_static.61.sm_75.cubin

I'm not entirely sure why the minor CC versions in-between (when present) might matter to be retained.


The concern does not apply to the other two supported real archs handled via nvprune as sm_80 is already provided, while sm_90 does not target anything newer (since it's the only arch for that CC major):

nvprune --generate-code code=sm_80 --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \
elif [ ${CUDA_COMPUTE_CAP} -eq 90 ]; \
then \
nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant