Skip to content

大模型回答一半不回答 #5271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Longleaves opened this issue Mar 21, 2025 · 2 comments
Open

大模型回答一半不回答 #5271

Longleaves opened this issue Mar 21, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@Longleaves
Copy link

Longleaves commented Mar 21, 2025

问题描述 / Problem Description
采用xinference部署qwq和qwen模型后,使用langchain-chatchat-0.3.1页面问答。大模型回答一半然后不回答,出现中断。请问chatchat是有输出token限制吗,在哪可以设置。各位有遇到类似情况吗,请教是什么原因引起的?

预期的结果 / Expected Result
模型输出完整。

实际结果 / Actual Result
回答不完整
Image

环境信息 / Environment Information

  • Langchain-Chatchat 版本 / commit 号:0.3.1
  • 部署方式(pypi 安装 / 源码部署 / docker 部署):pypi 安装
  • 使用的模型推理框架(Xinference / Ollama / OpenAI API 等):Xinference
  • 使用的 LLM 模型(GLM-4-9B / Qwen2-7B-Instruct 等): Qwen2.5-7B-Instruct, QwQ
  • 使用的 Embedding 模型(bge-large-zh-v1.5 / m3e-base 等):bge-large-zh-v1.5
  • 使用的向量库类型 (faiss / milvus / pg_vector 等): faiss
  • 操作系统及版本 / Operating system and version: Ubuntu
  • Python 版本 / Python version: 3.10
  • 推理使用的硬件(GPU / CPU / MPS / NPU 等) / Inference hardware (GPU / CPU / MPS / NPU, etc.): GPU
@Longleaves Longleaves added the bug Something isn't working label Mar 21, 2025
@Longleaves Longleaves changed the title 为什么输出一半有折断情况 大模型回答一半不回答 Mar 21, 2025
Copy link

dosubot bot commented Mar 21, 2025

Yes, there is a token output limit in Langchain-Chatchat 0.3.1 when using the xinference-deployed QwQ and Qwen models. This limit can be configured using the MAX_TOKENS setting in Settings.model_settings. If MAX_TOKENS is set, it is included in the params dictionary as params["max_tokens"]. Additionally, the default token limit for various models, including QwQ and Qwen, is set to 4000 tokens as specified in the frontend/src/config/modelProviders/chatchat.ts file [1][2].

I found a similar open issue discussing how to calculate the token count and dynamically limit it to ensure it does not exceed the maximum model length, but no solution was found in the comments [3].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@Longleaves
Copy link
Author

我以为不用填写就可以按默认最大长度,填写后发现效果好了。现在推理模型有思维链长度可以设置吗?
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant