Skip to content

QwQ-32B模型在接收超过5W字后会输出乱码 #3215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
IcarusLove520 opened this issue Apr 9, 2025 · 1 comment
Open
1 of 3 tasks

QwQ-32B模型在接收超过5W字后会输出乱码 #3215

IcarusLove520 opened this issue Apr 9, 2025 · 1 comment
Labels
Milestone

Comments

@IcarusLove520
Copy link

System Info / 系統信息

Ubuntu20.04

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

v1.4.0

The command used to start Xinference / 用以启动 xinference 的命令

docker run
-v /opt/xinference/.xinference:/root/.xinference
-v /opt/xinference/.cache/huggingface:/root/.cache/huggingface
-v /data2/xinference/models:/root/models
-p 9997:9997
-e XINFERENCE_MODEL_SRC=modelscope
-e XINFERENCE_HOME=/root/models
-e VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
-e CUDA_LAUNCH_BLOCKING=1
-e TORCH_USE_CUDA_DSA=1
--gpus all
-d
xprobe/xinference:1.4.0
xinference-local -H 0.0.0.0 --log-level debug

Reproduction / 复现过程

4张A100部署QwQ-32b

Image

Image
启动后,将长文本输入进去,返回一堆乱码(超过5W字就输出乱码了,超过10W字就报错,需要重启docker才行了)
乱码内容:

Image
超过10字报错内容:Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f55fe16c446 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f55fe1166e4 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f55fe5d0a18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f55a7685726 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f55a768a3f0 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f55a7691b5a in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f55a769361d in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0x145c0 (0x7f56991055c0 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch.so)
frame #8: + 0x94b43 (0x7f572d32cb43 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f572d3bdbb4 in /usr/lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Expected behavior / 期待表现

期待能完整的使用到QwQ-32b的13W的上下文长度

@XprobeBot XprobeBot added the gpu label Apr 9, 2025
@XprobeBot XprobeBot added this to the v1.x milestone Apr 9, 2025
@qinxuye
Copy link
Contributor

qinxuye commented Apr 9, 2025

这个版本我们计划升级 docker 里的 vllm 到最新版,到时候再看下是否还有这个问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants