Why are there so much redundant and irrelevant prompt data in the question classifier ? #17035

liujiawei9 · 2025-03-28T08:13:51Z

liujiawei9
Mar 28, 2025

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
Please do not modify this template :) and fill in all the required fields.

Content

The question classifier often fails to classify accurately.
I found in the running trace log that many irrelevant prompt words were added in the question classifier. Will this consume too many tokens and affect the performance of the LLM as well as the classification judgment?

I only inputted two words as "nice job", but the question classifier got a lot of prompts.

The complete JSON text is shown as follows:
{
"model_mode": "chat",
"prompts": [
{
"role": "system",
"text": "\n ### Job Description',\n You are a text classification engine that analyzes text data and assigns categories based on user input or automatically determined categories.\n ### Task\n Your task is to assign one categories ONLY to the input text and only one category may be assigned returned in the output. Additionally, you need to extract the key words from the text that are related to the classification.\n ### Format\n The input text is in the variable input_text. Categories are specified as a category list with two filed category_id and category_name in the variable categories. Classification instructions may be included to improve the classification accuracy.\n ### Constraint\n DO NOT include anything other than the JSON array in your response.\n ### Memory\n Here are the chat histories between human and assistant, inside XML tags.\n \n \n \n",
"files": []
},
{
"role": "user",
"text": "\n { "input_text": ["I recently had a great experience with your company. The service was prompt and the staff was very friendly."],\n "categories": [{"category_id":"f5660049-284f-41a7-b301-fd24176a711c","category_name":"Customer Service"},{"category_id":"8d007d06-f2c9-4be5-8ff6-cd4381c13c60","category_name":"Satisfaction"},{"category_id":"5fbbbb18-9843-466d-9b8e-b9bfbb9482c8","category_name":"Sales"},{"category_id":"23623c75-7184-4a2e-8226-466c2e4631e4","category_name":"Product"}],\n "classification_instructions": ["classify the text based on the feedback provided by customer"]}\n",
"files": []
},
{
"role": "assistant",
"text": "\njson\n {\"keywords\": [\"recently\", \"great experience\", \"company\", \"service\", \"prompt\", \"staff\", \"friendly\"],\n \"category_id\": \"f5660049-284f-41a7-b301-fd24176a711c\",\n \"category_name\": \"Customer Service\"}\n\n",
"files": []
},
{
"role": "user",
"text": "\n {"input_text": ["bad service, slow to bring the food"],\n "categories": [{"category_id":"80fb86a0-4454-4bf5-924c-f253fdd83c02","category_name":"Food Quality"},{"category_id":"f6ff5bc3-aca0-4e4a-8627-e760d0aca78f","category_name":"Experience"},{"category_id":"cc771f63-74e7-4c61-882e-3eda9d8ba5d7","category_name":"Price"}],\n "classification_instructions": []}\n",
"files": []
},
{
"role": "assistant",
"text": "\njson\n {\"keywords\": [\"bad service\", \"slow\", \"food\", \"tip\", \"terrible\", \"waitresses\"],\n \"category_id\": \"f6ff5bc3-aca0-4e4a-8627-e760d0aca78f\",\n \"category_name\": \"Experience\"}\n\n",
"files": []
},
{
"role": "user",
"text": "\n '{"input_text": ["nice job"],',\n '"categories": [{"category_id": "1711529038361", "category_name": "正面评价"}, {"category_id": "1711529041725", "category_name": "负面评价"}], ',\n '"classification_instructions": [""]}'\n",
"files": []
},
{
"role": "user",
"text": "nice job",
"files": []
}
],
"usage": {
"prompt_tokens": 729,
"prompt_unit_price": "0.0005",
"prompt_price_unit": "0.001",
"prompt_price": "0.0003645",
"completion_tokens": 38,
"completion_unit_price": "0.0015",
"completion_price_unit": "0.001",
"completion_price": "0.000057",
"total_tokens": 767,
"total_price": "0.0004215",
"currency": "USD",
"latency": 0.6129214520005917
},
"finish_reason": "stop"
}

2025-03-28T08:15:52Z

dosubot[bot]
bot Mar 28, 2025

I found 2 similar discussions that might be helpful:

Few-Shot Question Classifier is token-hungry - This thread discusses issues related to token consumption in a question classifier, which may relate to your concerns about redundant prompt data.
Question Classification function in workflow maybe abnormal - This thread addresses potential abnormalities in the question classification function, which could be relevant to your issue with classification accuracy.

0 replies

qq1273834091 · 2025-03-28T08:31:59Z

qq1273834091
Mar 28, 2025

问题分类器确实不怎么好用，你有什么更好的方式提高他的准确性吗？

1 reply

liujiawei9 Mar 28, 2025
Author

Using a reasoner model can greatly improve the accuracy, but it is too slow.

liujiawei9 · 2025-03-28T09:19:47Z

liujiawei9
Mar 28, 2025
Author

It seems that the redundant and irrelevant prompts come from this file :https://github.com/langgenius/dify/blob/main/api/core/workflow/nodes/question_classifier/template_prompts.py
But since I've already written my own prompts, why are these template prompts still being added? They're causing me a lot of interference.

1 reply

qq1273834091 Mar 28, 2025

Thank you very much for your prompt. If there is a better way to improve the recognition of prompt words, please share them together! thank you very much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are there so much redundant and irrelevant prompt data in the question classifier ? #17035

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Why are there so much redundant and irrelevant prompt data in the question classifier ? #17035

liujiawei9 Mar 28, 2025

Self Checks

Content

Replies: 3 comments · 2 replies

dosubot[bot] bot Mar 28, 2025

qq1273834091 Mar 28, 2025

liujiawei9 Mar 28, 2025 Author

liujiawei9 Mar 28, 2025 Author

qq1273834091 Mar 28, 2025

liujiawei9
Mar 28, 2025

Replies: 3 comments 2 replies

dosubot[bot]
bot Mar 28, 2025

qq1273834091
Mar 28, 2025

liujiawei9 Mar 28, 2025
Author

liujiawei9
Mar 28, 2025
Author