Error: Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80

### System Info

While starting using docker as below I get error

```
docker run --gpus all -p 8912:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model
```

I can run cpu only image.

```
Error: Could not create backend

Caused by:
    Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80
```

nvidia-smi output is

```
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-PCIE-16GB           Off |   00000000:04:01.0 Off |                    0 |
| N/A   30C    P0             38W /  250W |    7695MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     63756      C   python                                       2564MiB |
|    0   N/A  N/A     63785      C   python                                       2564MiB |
|    0   N/A  N/A     63813      C   python                                       2564MiB |
+-----------------------------------------------------------------------------------------+
```
OS
```
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy
```

Arch
```
x86_64
```

I also tried building image with setting cuda_compute_cap to 70

```
runtime_compute_cap=70
docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=$runtime_compute_cap
```

Here I get error as `cuda compute cap 70 is not supported`

```
------
 > [builder 2/9] RUN if [ 70 -ge 75 -a 70 -lt 80 ];     then          nvprune --generate-code code=sm_70 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     elif [ 70 -ge 80 -a 70 -lt 90 ];     then          nvprune --generate-code code=sm_80 --generate-code code=sm_70 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     elif [ 70 -eq 90 ];     then          nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     else          echo "cuda compute cap 70 is not supported"; exit 1;     fi;:
```


### Information

- [X] Docker
- [ ] The CLI directly

### Tasks

- [ ] An officially supported command
- [ ] My own modifications

### Reproduction

1. Use machine with above config
2. Run below command with using any supported model
```
docker run --gpus all -p 8912:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model
```

### Expected behavior

Should able to run image on v100 gpu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error: Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80 #268

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error: Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80 #268

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions