两张4090卡上部署rerank模型，无法并行计算 #3222

yidasanqian · 2025-04-10T03:13:54Z

asyncio并发跑600份文档rerank，还是单卡100%负载，另一张卡0%，请问是否需要特别的配置？

qinxuye · 2025-04-10T03:14:37Z

replica（副本）配置成2.

yidasanqian · 2025-04-10T03:21:29Z

副本是2，为什么gpu索引不是显示0，1？

yidasanqian · 2025-04-10T05:11:09Z

@qinxuye reranker.xinference_rerank:_rerank_batch:77 - rerank response text: {"detail":"Model actor is out of memory, model id: bge-reranker-v2-m3-1, error: CUDA out of memory. Tried to allocate 2.00 GiB. GPU 0 has a total capacity of 23.55 GiB of which 848.44 MiB is free. Process 159 has 0 bytes memory in use. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 19.13 GiB is allocated by PyTorch, and 525.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"}

怎么配置xinference以避免出现oom

qinxuye · 2025-04-10T06:47:34Z

你是两张卡吗？配置 replica 2 就行，不需要配置 gpu idx。

yidasanqian · 2025-04-10T08:24:19Z

你是两张卡吗？配置 replica 2 就行，不需要配置 gpu idx。

不止两张

qinxuye · 2025-04-10T08:28:04Z

目前多个 replica 还不能指定 gpu idx。

要限制 XInf 的使用可以启动的时候指定 CUDA_VISIBLE_DEVICES 。

Minamiyama · 2025-04-10T08:49:43Z

是不是可以开多个docker xinf服务实例，每个实例分配一个device idx🤣

qinxuye · 2025-04-10T09:00:08Z

后续要让 worker_ip 和 gpu_idx 支持多副本/分布式推理。还在设计。

jiliangqian · 2025-04-15T12:18:56Z

你是两张卡吗？配置 replica 2 就行，不需要配置 gpu idx。

我是两张卡，怎么部署多个replica >2 模型在一张卡上?

XprobeBot added this to the v1.x milestone Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

两张4090卡上部署rerank模型，无法并行计算 #3222

两张4090卡上部署rerank模型，无法并行计算 #3222

yidasanqian commented Apr 10, 2025

qinxuye commented Apr 10, 2025

yidasanqian commented Apr 10, 2025

yidasanqian commented Apr 10, 2025

qinxuye commented Apr 10, 2025

yidasanqian commented Apr 10, 2025 •

edited

Loading

qinxuye commented Apr 10, 2025

Minamiyama commented Apr 10, 2025

qinxuye commented Apr 10, 2025

jiliangqian commented Apr 15, 2025

两张4090卡上部署rerank模型，无法并行计算 #3222

两张4090卡上部署rerank模型，无法并行计算 #3222

Comments

yidasanqian commented Apr 10, 2025

qinxuye commented Apr 10, 2025

yidasanqian commented Apr 10, 2025

yidasanqian commented Apr 10, 2025

qinxuye commented Apr 10, 2025

yidasanqian commented Apr 10, 2025 • edited Loading

qinxuye commented Apr 10, 2025

Minamiyama commented Apr 10, 2025

qinxuye commented Apr 10, 2025

jiliangqian commented Apr 15, 2025

yidasanqian commented Apr 10, 2025 •

edited

Loading