-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[BUG] 检索效果很差 #5274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I found some similar threads related to retrieval issues in Langchain-Chatchat:
For your specific issue with poor retrieval performance on PDF documents containing text and tables, consider the following:
These adjustments might help improve the retrieval results for your PDF documents. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
@dosubot thanks for your reply。With your help, I found that adjusting PDF_OCR_THRESHOLD parameter can improve the retrieval results. However, the results of retrieval for questions related to Table data are still not ideal. Do you have any better suggestions? |
To improve retrieval results for questions related to table data in PDF documents using Langchain-Chatchat, you can utilize the from langchain.vectorstores import VectorStore
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever
class EnsembleRetrieverService(BaseRetrieverService):
@staticmethod
def from_vectorstore(
vectorstore: VectorStore,
top_k: int,
score_threshold: int | float,
):
faiss_retriever = vectorstore.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={"score_threshold": score_threshold, "k": top_k},
)
import jieba
docs = list(vectorstore.docstore._dict.values())
bm25_retriever = BM25Retriever.from_documents(
docs,
preprocess_func=jieba.lcut_for_search,
)
bm25_retriever.k = top_k
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, faiss_retriever], weights=[0.5, 0.5]
)
return EnsembleRetrieverService(retriever=ensemble_retriever, top_k=top_k) This setup uses both BM25 and FAISS retrievers with equal weights to balance between keyword-based and vector-based retrieval, potentially improving the accuracy of retrieving table data from PDF documents. Adjust the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
@dosubot I found that the latest version of Langchain-Chatchat uses both BM25 and FAISS retrievers by default when retrieving knowledge. However, it still cannot retrieve content related to table information. Is there any other solution? |
To enhance the retrieval of table data in Langchain-Chatchat, consider the following approaches:
These strategies can help improve the retrieval of table data in Langchain-Chatchat. Adjusting configurations and preprocessing methods can lead to better handling of structured data like tables. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
@dosubot I will try these constructive suggestions, thank you very much for your reply. There is last question: How to extend the Langchain-Chatchat framework to multimodal scenarios? |
To extend the Langchain-Chatchat framework to multimodal scenarios, you can leverage its existing support for multimodal data processing. Here are some steps and considerations:
By configuring these options and utilizing the existing tools, you can extend Langchain-Chatchat to handle multimodal scenarios effectively. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
佬 你是在哪看到检索流程的 我看代码就调用了一个client api 之后没看到检索流程 我不是源码安装 是这个原因吗 |
佬 能加个V吗 15097431207
哎呀嘞!
***@***.***
…------------------ 原始邮件 ------------------
发件人: "chatchat-space/Langchain-Chatchat" ***@***.***>;
发送时间: 2025年3月24日(星期一) 晚上7:20
***@***.***>;
***@***.******@***.***>;
主题: Re: [chatchat-space/Langchain-Chatchat] [BUG] 检索效果很差 (Issue #5274)
佬 你是在哪看到检索流程的 我看代码就调用了一个client api 之后没看到检索流程 我不是源码安装 是这个原因吗
test1.png (view on web)。在做知识库对话时,服务端入口对应libs/chatchat-server/chatchat/server/api_server/kb_routes.py文件里的这个方法。
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
FanZhang91 left a comment (chatchat-space/Langchain-Chatchat#5274)
佬 你是在哪看到检索流程的 我看代码就调用了一个client api 之后没看到检索流程 我不是源码安装 是这个原因吗
test1.png (view on web)。在做知识库对话时,服务端入口对应libs/chatchat-server/chatchat/server/api_server/kb_routes.py文件里的这个方法。
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
@dosu 用户界面输入的text信息为:“根据这个图片中的报错信息,帮我分析原因”。
|
在构建了pdf文档(pdf中含有text以及table数据)对应的数据库之后,然后用Langchain-Chatchat做检索。在没有涉及到多轮对话的范畴下检索效果就很差,请问是什么原因?请问要怎么改进?
embedding model: quentinz/bge-large-zh-v1.5
llm model: qwen2:7b
The text was updated successfully, but these errors were encountered: