Update Docker images to latest Ubuntu version #610

vrdn-23 · 2025-05-22T17:03:38Z

What does this PR do?

The docker base images haven't been updated in a while so I was wondering if we could port them over to the more newer base images and Ubuntu LTS version. Let me know if there are any concerns!

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

cc @Narsil @alvarobartt

Narsil · 2025-06-02T14:03:08Z

What's the rationale here ?

Upgrading deps is indeed nice, but Cuda 12.8 is rather new (Jan 2025) so it would make TEI fail to run on any older deployments/noeds. Unless it unlocks things, I don't think we should upgrade at the moment.

Ubuntu 24 should be ok.

vrdn-23 · 2025-06-02T22:16:47Z

I was hoping to get us upgraded to the latest CUDA 12.x versions since within minor releases, CUDA is mostly backwards compatible.
If I understand the link correctly, since the current TEI image is on 12.2, most nodes/deployments will already have the minimum required driver version to be able to run 12.8.

Let me know if I misunderstood something here @Narsil

Narsil · 2025-06-03T12:46:25Z

I think doesn't hold for nvidia container: NVIDIA/nvidia-container-toolkit#940

It's been a while I haven't personally see this arise since we're trying to keep up a lot with newer versions of everything, but the cuda version of the node has caused issues in the past in clusters I manage.

Is there any particular reason wanting to upgrade ? (The stance here is that if it's not broken, no need to fix it, and we can take advantage of a later minor upgrade do to such potentially breaking version upgrades)

vrdn-23 · 2025-06-03T16:27:26Z

@Narsil Thanks for pointing out that issue. Forward compatibility is not something that I considered.

Is there any particular reason wanting to upgrade ?

I think the rationale is to just ensure that we don't fall too behind on dependency upgrades. CUDA 12.2 was released in June 2023, and the driver version shipped for 12.2 is not really compatible with some of the newer GPUs coming out (see the AWS EC2 instance/nvidia-driver compatibility matrix).

I am fine with reverting the PR to just the Ubuntu update and we can maybe update the CUDA version in a later major TEI release (1.8.0 or 2.0?), but the current change is still technically only a minor version update of the CUDA drivers themselves. So I'm a little ambivalent/curious on how this would fit into a TEI release lifecycle?

Narsil · 2025-06-03T17:47:23Z

1.8 is fine for those kind of upgrades.

If you just update ubuntu I will definitely merge as-is, otherwise we can leave as-is and I'll merge when 1.8 hits (there are no plans just yet, usually it happens when there's something significant happening, not necessarily a breaking change).

Again, I think it's welcome in general to update regularly, but having been bitten in the past, and seeing no obvious reason right now, I tend to delay those including them by default.

Thanks a lot for the PR regardless.

vrdn-23 · 2025-06-03T18:17:40Z

@Narsil Thanks for the update! I'll revert the CUDA changes then, so it can make the 1.7.1 release

vrdn-23 · 2025-06-03T18:20:39Z

Oops. Looks like I was too late! Either way I can keep track of this and raise another PR when the time is right to update to the latest CUDA version. Thanks for the feedback and the discussion!

polarathene · 2025-06-05T01:25:06Z

For such changes regarding CUDA, it's probably better to measure any actual gain from bumping min version. Likewise for the concern about lacking support, by having an actual case of breaking compatibility.

If someone does have one, I'd appreciate that but my understanding with cudarc usage is the following:

dynamic-loading (default) attempts to find libcuda.so or equivalent. Useful choice when you don't want to force CUDA as a dep to launch your program (perhaps it supports CPU or ROCm instead for example). If it fails, a panic is triggered (or less helpful failure if you've built with panic = "abort").
dynamic-linking with generic links to libs, if there is no version pinning there (I don't think there is with how cudarc is linking), then prior to initializing your program the system linker will try resolve these libs (which unlike with dynamic-loading, you can actually get a list of via ldd / patchelf --print-needed). If a dep is missing, you'll get a failure message output to the terminal before any handover to your app (helpful if you've built with panic = "abort").
static-linking embeds the .a static libs, bloating size up considerably, but still enforces the dynamic link to libcuda.so. The dynamic link is tied to the CUDA driver version, so there could be potential incompatibility concerns there.

That said there is no actual build of those common CUDA libs regardless of choice. static-linking doesn't really benefit from LTO (you can have cudarc opt-in to the other CUDA features, but only have a small program that uses driver feature, resulting in approx 700MB binary). Everything is already pre-compiled, and I don't think there's any build-time optimization involved with cudarc, it's just providing an API? (unless it has some conditionals to prefer calls when the target CUDA version is high enough?)

I had assumed then that the only actual concern for compatibility was if you needed an API call that wouldn't build due to not being available for a lower version of CUDA, which is more obvious need for a version bump... or as this project already does with it's Docker image builds, nvprune stripping archs from the nvidia supplied libs to minimize size... but that's completely unrelated to cudarc, if your project requires features only compatible from sm80 / compute_80 onwards, that's the actual min target, and CUDA version minimum is tied to that?

As such bumping the version of CUDA for the build be that in cudarc or the Ubuntu image here shouldn't make any notable difference in support provided those linked CUDA libs have the expected archs support covered. I haven't checked, but I assume Nvidia EOLs arch support within their images, so sm_75 could be missing for example.

Have I understood that all correctly?

There probably isn't much benefit to static-linking if distributing within a container that can provide the libs to link dynamically. That should only benefit for distributing outside of the container in other environments.

In both cases libcuda.so is provided by the host system for runtime use (the one in the image isn't used, only as a stub for linking).

vrdn-23 · 2025-06-12T17:02:19Z

@Narsil is this good to merge?

vrdn-23 · 2025-06-12T18:07:17Z

Sorry for the late reply but this was a really eye-opening comment for me @polarathene.
You are absolutely right in saying that the CUDA version for the TEI image shouldn't really matter here because the linking for the text-embeddings-router is provided by the host machine during runtime!

So in this case, updating the CUDA version doesn't really provide any benefits because the underlying libcuda.so is not the one used by image deployed!
Thanks again for taking the time to write this out.

polarathene · 2025-06-12T22:43:31Z

updating the CUDA version doesn't really provide any benefits

I am still trying to grok it myself, but I do know the version of nvcc can be relevant (especially if compiling kernels with bindgen_cuda crate which filters compute capability based on the nvcc command support), not entirely sure if that's tied to CUDA version, but newer image would allow for building with a higher compute capability.

The build is also tied to the major version of CUDA, although from what I've read it's only been major version compatible since CUDA 12? Once there is CUDA 13, it might differ again 😅

Mainly though, virtual or real architecture with compute capability is targeted for build, where virtual can provide forward compatible PTX to compile CUDA kernels at runtime via JIT, and you can use that to set the baseline/minimum compute capability required to be compatible. Too low and you may miss some performance benefits on newer hardware though (more than one PTX can be bundled too).

This separate cuda crate is something I wasn't familiar with in my previous comment, but I am aware of TEI building it's own kernels for the subset of targets, so there is some relevance there.

polarathene · 2025-06-13T05:53:29Z

UPDATE: PR: #635

FWIW, I recently came across this advice:

In case of cuBLAS, particular care must be taken if using nvprune with compute capabilities, whose minor revision number is different than 0._

To reduce binary size, cuBLAS may only store major revision equivalents of CUDA binary files for kernels reused between different minor revision versions.

Therefore, to ensure that a pruned library does not fail for arbitrary problems, the user must keep binaries for a selected architecture and all prior minor architectures in its major architecture.

Yet this project uses nvprune just like the link demonstrates as the discouraged example for sm_75, using nvprune this way for cuBLAS:

text-embeddings-inference/Dockerfile-cuda

Lines 52 to 54 in 53eae1b

    
           if [ ${CUDA_COMPUTE_CAP} -ge 75 -a ${CUDA_COMPUTE_CAP} -lt 80 ]; \ 
        
           then  \ 
        
               nvprune --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \

The command should apparently keep the major by including --generate-code code=sm_70:

nvprune --generate-code code=sm_70 --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a

Not sure how compatible that is, as I haven't tested it (the project README specifically states that CC 7.5 is the minimum (without FA feature), so that may fail but this min CC version requirement isn't clarified). I don't know if that was discussed in the past, or if there's been any reports about that target failing from that image build, but I thought I'd bring it to your attention.

CUDA 10 has a minimum CC of 7.5
CC 7.5 only adds a uniform data path feature and BF16 / TF32 data types (although it's noted as unofficially supported via SASS, with proper support in PTX from CC 8.0, official docs also refer to BF16 requiring CC 8.0).

As the README doesn't clarify why CC 7.5 is the minimum, perhaps the above is fine? It's only pruning a static lib, not building for a lower CC version.

Update to latest CUDA version and Ubuntu version

2f9eded

Revert CUDA version upgrade

9721c23

vrdn-23 changed the title ~~Update Docker images to latest CUDA version and Ubuntu version~~ Update Docker images to latest Ubuntu version Jun 4, 2025

This was referenced Jun 13, 2025

Error: Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80 #268

Open

chore: Dockerfile-cuda - Retain major CC when pruning static cuBLAS lib #635

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Docker images to latest Ubuntu version #610

Update Docker images to latest Ubuntu version #610

Uh oh!

vrdn-23 commented May 22, 2025

Uh oh!

Narsil commented Jun 2, 2025

Uh oh!

vrdn-23 commented Jun 2, 2025 •

edited

Loading

Uh oh!

Narsil commented Jun 3, 2025

Uh oh!

vrdn-23 commented Jun 3, 2025

Uh oh!

Narsil commented Jun 3, 2025

Uh oh!

vrdn-23 commented Jun 3, 2025

Uh oh!

vrdn-23 commented Jun 3, 2025

Uh oh!

polarathene commented Jun 5, 2025

Uh oh!

vrdn-23 commented Jun 12, 2025

Uh oh!

vrdn-23 commented Jun 12, 2025

Uh oh!

polarathene commented Jun 12, 2025

Uh oh!

polarathene commented Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Update Docker images to latest Ubuntu version #610

Are you sure you want to change the base?

Update Docker images to latest Ubuntu version #610

Uh oh!

Conversation

vrdn-23 commented May 22, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Narsil commented Jun 2, 2025

Uh oh!

vrdn-23 commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Narsil commented Jun 3, 2025

Uh oh!

vrdn-23 commented Jun 3, 2025

Uh oh!

Narsil commented Jun 3, 2025

Uh oh!

vrdn-23 commented Jun 3, 2025

Uh oh!

vrdn-23 commented Jun 3, 2025

Uh oh!

polarathene commented Jun 5, 2025

Uh oh!

vrdn-23 commented Jun 12, 2025

Uh oh!

vrdn-23 commented Jun 12, 2025

Uh oh!

polarathene commented Jun 12, 2025

Uh oh!

polarathene commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vrdn-23 commented Jun 2, 2025 •

edited

Loading

polarathene commented Jun 13, 2025 •

edited

Loading