-
Notifications
You must be signed in to change notification settings - Fork 705
fix(wren-ai-service): The DeepSeek response may not be a valid JSON format. #1529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(wren-ai-service): The DeepSeek response may not be a valid JSON format. #1529
Conversation
WalkthroughThe changes add a new utility function, Changes
Sequence Diagram(s)sequenceDiagram
participant Run as _run Method
participant Util as extract_braces_content
Run->>Util: extract_braces_content(message.content)
Util-->>Run: extractedContent
Note over Run: Append extractedContent to replies list
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
wren-ai-service/src/utils.py (1)
161-169
: Add documentation to improve clarity and address potential edge casesThe implementation of
extract_braces_content
extracts content within braces correctly, but it lacks documentation explaining its purpose, behavior, and potential limitations. While it works for JSON objects, it doesn't handle JSON arrays that start with[
and end with]
.Add a descriptive docstring and consider handling JSON arrays:
# For adapting deepseek response content contains "```json{....}```" def extract_braces_content(resp: str) -> str: + """ + Extracts content enclosed within the outermost braces from a string. + + This function is specifically designed to handle DeepSeek responses that may contain + extraneous characters like "```json" at the beginning and "```" at the end. + + Args: + resp (str): The string potentially containing JSON enclosed in braces + + Returns: + str: The extracted JSON content if valid braces are found, otherwise the original string + + Note: + This function only handles JSON objects (enclosed in {}), not JSON arrays (enclosed in []). + """ start = resp.find('{') end = resp.rfind('}') if start == -1 or end == -1 or end <= start: return resp return resp[start:end+1]wren-ai-service/src/providers/llm/litellm.py (1)
110-110
: Properly fixed the DeepSeek JSON formatting issueGood implementation of the fix for the DeepSeek response formatting issue. The code now correctly extracts the content within braces before returning it in the
replies
list, which resolves the problem with DeepSeek returning responses with extraneous characters.Consider adding a comment to explain why this transformation is necessary:
return { - "replies": [extract_braces_content(message.content) for message in completions], + # Extract JSON content from responses (fixes DeepSeek's "```json{...}```" formatting issue) + "replies": [extract_braces_content(message.content) for message in completions], "meta": [message.meta for message in completions], }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
wren-ai-service/src/providers/llm/litellm.py
(2 hunks)wren-ai-service/src/utils.py
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
wren-ai-service/src/providers/llm/litellm.py (1)
wren-ai-service/src/utils.py (1)
extract_braces_content
(162-169)
🔇 Additional comments (2)
wren-ai-service/src/providers/llm/litellm.py (2)
20-20
: Import added correctly for the newly implemented utility functionThe import statement for the new
extract_braces_content
function has been correctly added.
72-82
: Verify this fix handles all DeepSeek response format variationsThe implemented solution addresses the specific case mentioned in the PR description, but it's important to verify that it handles all possible response formats from DeepSeek.
Could you verify that the fix works for all variations of DeepSeek responses by testing with different response formats? Look for edge cases where the content might not be properly extracted, such as:
- Responses with multiple JSON objects
- Responses with JSON arrays instead of objects
- Responses with nested JSON structures
- Responses with no JSON-like content
This testing will ensure the robustness of the solution for all potential DeepSeek response formats.
When testing WrenAI with DeepSeek, I encountered a JSON formatting exception. After installing Langfuse, I noticed that DeepSeek frequently returns responses in JSON_OBJECT format with extra characters - specifically starting with ```json and ending with ```. To accommodate this behavior, we need to implement substring extraction based on curly brace positions to ensure successful JSON parsing.
Additionally, I observed that historical issues contain a similar problem.
https://github.com/Canner/WrenAI/issues/1354#issuecomment-2704457311