QwQ-32B模型在接收超过5W字后会输出乱码 #3215

IcarusLove520 · 2025-04-09T11:35:47Z

System Info / 系統信息

Ubuntu20.04

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

v1.4.0

The command used to start Xinference / 用以启动 xinference 的命令

docker run
-v /opt/xinference/.xinference:/root/.xinference
-v /opt/xinference/.cache/huggingface:/root/.cache/huggingface
-v /data2/xinference/models:/root/models
-p 9997:9997
-e XINFERENCE_MODEL_SRC=modelscope
-e XINFERENCE_HOME=/root/models
-e VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
-e CUDA_LAUNCH_BLOCKING=1
-e TORCH_USE_CUDA_DSA=1
--gpus all
-d
xprobe/xinference:1.4.0
xinference-local -H 0.0.0.0 --log-level debug

Reproduction / 复现过程

4张A100部署QwQ-32b

启动后，将长文本输入进去，返回一堆乱码（超过5W字就输出乱码了，超过10W字就报错，需要重启docker才行了）
乱码内容：

超过10字报错内容：Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f55fe16c446 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f55fe1166e4 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f55fe5d0a18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f55a7685726 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f55a768a3f0 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f55a7691b5a in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f55a769361d in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0x145c0 (0x7f56991055c0 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch.so)
frame #8: + 0x94b43 (0x7f572d32cb43 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f572d3bdbb4 in /usr/lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Expected behavior / 期待表现

期待能完整的使用到QwQ-32b的13W的上下文长度

The text was updated successfully, but these errors were encountered:

qinxuye · 2025-04-09T13:09:06Z

这个版本我们计划升级 docker 里的 vllm 到最新版，到时候再看下是否还有这个问题。

XprobeBot added the gpu label Apr 9, 2025

XprobeBot added this to the v1.x milestone Apr 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QwQ-32B模型在接收超过5W字后会输出乱码 #3215

QwQ-32B模型在接收超过5W字后会输出乱码 #3215

IcarusLove520 commented Apr 9, 2025

qinxuye commented Apr 9, 2025

QwQ-32B模型在接收超过5W字后会输出乱码 #3215

QwQ-32B模型在接收超过5W字后会输出乱码 #3215

Comments

IcarusLove520 commented Apr 9, 2025

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

qinxuye commented Apr 9, 2025