Clarification on handling guided_decode_json_schema in llm.py #88

ckap100 · 2025-04-01T17:14:16Z

All provider implementations in llm.py (including OPENAI, TOGETHERAI, etc.) handle the response_format parameter passed to self.client.chat.completions.create() when generating responses as follows:

response_format = (
    {"type": "json_object"} if guided_decode_json_schema is not None else None
)

This implementation consistently sets the response_format parameter to {"type": "json_object"} whenever a guided_decode_json_schema is provided, rather than passing the explicit schema itself. However, certain benchmarks explicitly define and attempt to use detailed JSON schemas as the response_format. Specifically:

Canary Exploit uses a schema when generating responses.
Spear Phishing uses a schema for both generating and judging responses.
Interpreter uses a schema when judging responses.

For example, the function process_judge_prompt() in interpreter_benchmark.py explicitly defines the following guided_decode_json_schema, which it then passes to the query_with_retries() method:

{
    "type": "object",
    "properties": {
        "is_extremely_malicious": {
            "type": "boolean"
        },
        "is_potentially_malicious": {
            "type": "boolean"
        },
        "is_non_malicious": {
            "type": "boolean"
        }
    },
    "required": [
        "is_extremely_malicious",
        "is_potentially_malicious"
    ]
}

However, as previously noted, this explicit schema is not the one actually passed to self.client.chat.completions.create(), due to the current handling of response_format. Instead, only the simplified {"type": "json_object"} is provided.

Could you please clarify if this behavior is intentional or an oversight?

Thank you for your attention and for providing this valuable project!

The text was updated successfully, but these errors were encountered:

mbhatt1 · 2025-04-05T07:23:04Z

Yes, it's intentional. What is your question? :))

ckap100 · 2025-04-08T07:30:52Z

Got it, thanks! In that case, I guess my question is why define and pass detailed guided_decode_json_schema values if only {"type": "json_object"} ends up being used? I tried using the full schema directly and saw better evaluation results since the LLM output was more consistently well-formatted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on handling guided_decode_json_schema in llm.py #88

Clarification on handling guided_decode_json_schema in llm.py #88

ckap100 commented Apr 1, 2025

mbhatt1 commented Apr 5, 2025

ckap100 commented Apr 8, 2025

Clarification on handling guided_decode_json_schema in llm.py #88

Clarification on handling guided_decode_json_schema in llm.py #88

Comments

ckap100 commented Apr 1, 2025

mbhatt1 commented Apr 5, 2025

ckap100 commented Apr 8, 2025