You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
问题描述 / Problem Description
采用xinference部署qwq和qwen模型后,使用langchain-chatchat-0.3.1页面问答。大模型回答一半然后不回答,出现中断。请问chatchat是有输出token限制吗,在哪可以设置。各位有遇到类似情况吗,请教是什么原因引起的?
预期的结果 / Expected Result
模型输出完整。
实际结果 / Actual Result
回答不完整
环境信息 / Environment Information
Langchain-Chatchat 版本 / commit 号:0.3.1
部署方式(pypi 安装 / 源码部署 / docker 部署):pypi 安装
使用的模型推理框架(Xinference / Ollama / OpenAI API 等):Xinference
Yes, there is a token output limit in Langchain-Chatchat 0.3.1 when using the xinference-deployed QwQ and Qwen models. This limit can be configured using the MAX_TOKENS setting in Settings.model_settings. If MAX_TOKENS is set, it is included in the params dictionary as params["max_tokens"]. Additionally, the default token limit for various models, including QwQ and Qwen, is set to 4000 tokens as specified in the frontend/src/config/modelProviders/chatchat.ts file [1][2].
I found a similar open issue discussing how to calculate the token count and dynamically limit it to ensure it does not exceed the maximum model length, but no solution was found in the comments [3].
问题描述 / Problem Description
采用xinference部署qwq和qwen模型后,使用langchain-chatchat-0.3.1页面问答。大模型回答一半然后不回答,出现中断。请问chatchat是有输出token限制吗,在哪可以设置。各位有遇到类似情况吗,请教是什么原因引起的?
预期的结果 / Expected Result
模型输出完整。
实际结果 / Actual Result

回答不完整
环境信息 / Environment Information
The text was updated successfully, but these errors were encountered: