Open
Description
System Info
While starting using docker as below I get error
docker run --gpus all -p 8912:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model
I can run cpu only image.
Error: Could not create backend
Caused by:
Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80
nvidia-smi output is
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100-PCIE-16GB Off | 00000000:04:01.0 Off | 0 |
| N/A 30C P0 38W / 250W | 7695MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 63756 C python 2564MiB |
| 0 N/A N/A 63785 C python 2564MiB |
| 0 N/A N/A 63813 C python 2564MiB |
+-----------------------------------------------------------------------------------------+
OS
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
Arch
x86_64
I also tried building image with setting cuda_compute_cap to 70
runtime_compute_cap=70
docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=$runtime_compute_cap
Here I get error as cuda compute cap 70 is not supported
------
> [builder 2/9] RUN if [ 70 -ge 75 -a 70 -lt 80 ]; then nvprune --generate-code code=sm_70 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; elif [ 70 -ge 80 -a 70 -lt 90 ]; then nvprune --generate-code code=sm_80 --generate-code code=sm_70 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; elif [ 70 -eq 90 ]; then nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; else echo "cuda compute cap 70 is not supported"; exit 1; fi;:
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
- Use machine with above config
- Run below command with using any supported model
docker run --gpus all -p 8912:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model
Expected behavior
Should able to run image on v100 gpu
Metadata
Metadata
Assignees
Labels
No labels