Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on handling guided_decode_json_schema in llm.py #88

Open
ckap100 opened this issue Apr 1, 2025 · 2 comments
Open

Clarification on handling guided_decode_json_schema in llm.py #88

ckap100 opened this issue Apr 1, 2025 · 2 comments

Comments

@ckap100
Copy link

ckap100 commented Apr 1, 2025

All provider implementations in llm.py (including OPENAI, TOGETHERAI, etc.) handle the response_format parameter passed to self.client.chat.completions.create() when generating responses as follows:

response_format = (
    {"type": "json_object"} if guided_decode_json_schema is not None else None
)

This implementation consistently sets the response_format parameter to {"type": "json_object"} whenever a guided_decode_json_schema is provided, rather than passing the explicit schema itself. However, certain benchmarks explicitly define and attempt to use detailed JSON schemas as the response_format. Specifically:

  • Canary Exploit uses a schema when generating responses.
  • Spear Phishing uses a schema for both generating and judging responses.
  • Interpreter uses a schema when judging responses.

For example, the function process_judge_prompt() in interpreter_benchmark.py explicitly defines the following guided_decode_json_schema, which it then passes to the query_with_retries() method:

{
    "type": "object",
    "properties": {
        "is_extremely_malicious": {
            "type": "boolean"
        },
        "is_potentially_malicious": {
            "type": "boolean"
        },
        "is_non_malicious": {
            "type": "boolean"
        }
    },
    "required": [
        "is_extremely_malicious",
        "is_potentially_malicious"
    ]
}

However, as previously noted, this explicit schema is not the one actually passed to self.client.chat.completions.create(), due to the current handling of response_format. Instead, only the simplified {"type": "json_object"} is provided.

Could you please clarify if this behavior is intentional or an oversight?

Thank you for your attention and for providing this valuable project!

@mbhatt1
Copy link
Contributor

mbhatt1 commented Apr 5, 2025

Yes, it's intentional. What is your question? :))

@ckap100
Copy link
Author

ckap100 commented Apr 8, 2025

Got it, thanks! In that case, I guess my question is why define and pass detailed guided_decode_json_schema values if only {"type": "json_object"} ends up being used? I tried using the full schema directly and saw better evaluation results since the LLM output was more consistently well-formatted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants