Large list[Model] result type causing `pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries` - bigger lists a problem? #734

dmyung · 2025-01-22T04:39:05Z

I have an agent that is trying to return a list of objects as the return type for a tool. I'm using claude haiku 3.0 on AWS bedrock via the workaround here (#118 (comment)).

For single object returns and even nested and small lists, this works fine! However when trying to apply various agent and return patterns with our using tools and return types, I started running into the pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries error message inconsistently.

I initially saw this behavior similar to #200 as well as #523 - and other instances of this issue, however like I said, it works beautifully on small return values.

But at a certain size of an array the response gives me the Exceeded maximum retries error.

In practice depending on the size of the strings in the objects, sometimes it fails at 5 instances, other times 15 or even 25.

# Version Info
pydantic==2.10.5
pydantic-ai-slim==0.0.18
pydantic-settings==2.7.1
pydantic_core==2.27.2

Code

from dataclasses import dataclass
from typing import Optional

from anthropic import AsyncAnthropicBedrock
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.anthropic import AnthropicModel


class Collection(BaseModel):
    collection_id: str = Field(description='The collection identifier prefixed with COLL-')
    description: str = Field(description='A description of the collection')
    program: Optional[str] = Field(description='the program the collection is a part of, if it is null, then the assay is programless')

class DataConn:
    @classmethod
    def collections_by_program(cls, size: int, program: str | None=None) -> list[Collection]:
        ret = []
        for i in range(size):
            ret.append(
                Collection(collection_id=f'COLL-{i}', program=program, description=f'collection of things {i}')
            )
        return ret


@dataclass
class DataDeps:
    size: int
    conn: DataConn

anthropic_bedrock_client = AsyncAnthropicBedrock(aws_region='us-east-1')

model = AnthropicModel(
    model_name='anthropic.claude-3-haiku-20240307-v1:0',
    anthropic_client=anthropic_bedrock_client
)



collection_search_agent = Agent(model,
                                deps_type=DataDeps,
                                result_type=list[Collection],
                                system_prompt=(
                                    'Extract a list of collection objects matching the search criteria.'
                                    ' Use `collections_by_program` to get a list of our documents. Return a list of assays of collection_id (COLL-DDD), description, and program'
                                ), retries=3
                    )


@collection_search_agent.tool
def collections_by_program(ctx: RunContext[DataDeps], program: str| None=None) -> list[Collection]:
    '''
    Given a program, return the collections corresponding to that program
    '''
    # pydantic_ai/agent.py : 1106
    found_collections = ctx.deps.conn.collections_by_program(size=ctx.deps.size, program=program)
    return found_collections

And to test it

def run_stress_test(size: int=10):
    deps = stress_test.DataDeps(size=size, conn=test.DataConn())
    deps.size = size
    agent = stress_test.collection_search_agent
    results = agent.run_sync('how many collections are in the orion program?', deps=deps)
    print(results)

run_stress_test(size=25)

_new_message_index=0,
    data=[
        Collection(collection_id='COLL-0', description='collection of things 0', program='orion'),
        Collection(collection_id='COLL-1', description='collection of things 1', program='orion'),
        Collection(collection_id='COLL-2', description='collection of things 2', program='orion'),
        Collection(collection_id='COLL-3', description='collection of things 3', program='orion'),
        Collection(collection_id='COLL-4', description='collection of things 4', program='orion'),
        Collection(collection_id='COLL-5', description='collection of things 5', program='orion'),
        Collection(collection_id='COLL-6', description='collection of things 6', program='orion'),
        Collection(collection_id='COLL-7', description='collection of things 7', program='orion'),
        Collection(collection_id='COLL-8', description='collection of things 8', program='orion'),
        Collection(collection_id='COLL-9', description='collection of things 9', program='orion'),
        Collection(collection_id='COLL-10', description='collection of things 10', program='orion'),
        Collection(collection_id='COLL-11', description='collection of things 11', program='orion'),
        Collection(collection_id='COLL-12', description='collection of things 12', program='orion'),
        Collection(collection_id='COLL-13', description='collection of things 13', program='orion'),
        Collection(collection_id='COLL-14', description='collection of things 14', program='orion'),
        Collection(collection_id='COLL-15', description='collection of things 15', program='orion'),
        Collection(collection_id='COLL-16', description='collection of things 16', program='orion'),
        Collection(collection_id='COLL-17', description='collection of things 17', program='orion'),
        Collection(collection_id='COLL-18', description='collection of things 18', program='orion'),
        Collection(collection_id='COLL-19', description='collection of things 19', program='orion'),
        Collection(collection_id='COLL-20', description='collection of things 20', program='orion'),
        Collection(collection_id='COLL-21', description='collection of things 21', program='orion'),
        Collection(collection_id='COLL-22', description='collection of things 22', program='orion'),
        Collection(collection_id='COLL-23', description='collection of things 23', program='orion'),
        Collection(collection_id='COLL-24', description='collection of things 24', program='orion')
    ],
    _result_tool_name='final_result',
    _usage=Usage(requests=4, request_tokens=5300, response_tokens=2728, total_tokens=8028, details=None)
)

run_stress_test(size=35)
/pydantic_ai/agent.py", line 1063, in _handle_model_response
    return await self._handle_structured_response(tool_calls, run_context, result_schema)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
raise exceptions.UnexpectedModelBehavior(
pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries (3) for result validation

In practice with our documents this has raised this exception at like lengths of 4. I don't fully understand how the token usage falls into this, I tried to up the size to see if it could be part of Claude's 4000 token response limit, but I'd seen it spill over no problem in some use cases, but fall over on other runs.

For the most part for returning matches to 1-3 items in this agentic workflow this is working fine, and it may be a bit odd to return so many rows to be interpreted by the LLM as part of the workflow. For one use case for us this could be a large RAG response from our vector store or full text query, or in the example above, it's a direct sql query for a series of documents affiliated to a program within our organization where we expecting more than 5 results to have it chained into another workflow.

The text was updated successfully, but these errors were encountered:

dmontagu · 2025-01-22T20:53:41Z

I believe this may be due in part to a bug where we are counting each individual validation error as a retry, rather than counting a collection of validation errors as a single retry. That probably explains why this is more commonly an issue when you are using large lists.

Let me look into that, and then we can see if there is another issue going on.

Related, I think we need to make it easier to debug validation errors like this, I'll try to think about how we can expose that more easily.

dmyung · 2025-01-23T02:47:46Z

Thanks, in my debugging of this issue, I had seen the parts or results return None when the problem reared itself depending on the size.

Curious, why and how the validation errors are happening/accumulating for large lists, if within my tool function I'm guaranteeing that the resultant return data is legal return data as per the return value pydantic model, is using the LLM in question to reparse and regenerate the content again?

rasulkireev · 2025-02-22T07:02:31Z

I noticed this happening for me on the antrhopic latest sonnet model and not on the google's gemini 2 flash.

I have a simple result model:

class BlogPostContent(BaseModel):
    description: str = Field(description="Meta description (150-160 characters) optimized for search engines")
    slug: str = Field(description="URL-friendly format using lowercase letters, numbers, and hyphens")
    tags: str = Field(description="5-8 relevant keywords as comma-separated values")
    content: str = Field(description="Full blog post content in Markdown format with proper structure and formatting")

but a long system prompt:

You are an experienced SEO content strategist.
        You specialize in creating search-engine optimized content that ranks well
        and provide value to our target audience.
        Your task is to generate an SEO-optimized blog post. Given the title and description
        of the desired post. Here are some specific pointer:
 
1. Description:
{long list of requirements}

2. Slug:
{long list of requirements}

3. Tags:
{long list of requirements}

4. Content:
{long list of requirements}

and the actual message:

             - Today's Date: 2025-02-22
             - Project URL: https://github.com
             - Project Name: GitHub
             - Project Type: SaaS
             - Project Summary: {paragraph summary}
             - Blog Theme: {blog theme}
             - Founders: {founders}
             - Key Features: {key features}
             - Target Audience: {target audience}
             - Pain Points: {pain points}
             - Product Usage: {product usage}
             - Language: English
             - Links: List of links
             - Primary Keyword/Title: GitHub Actions: The Ultimate Guide to DevOps Automation
             - Category: DevOps Automation
             - Description: This blog post provides a guide on automating workflows using GitHub Actions for CI/CD.
             - Target Keywords: ['GitHub Actions', 'CI/CD', 'DevOps automation', 'workflow automation', 'continuous integration']
             - Suggested Meta Description: Automate your DevOps workflows with GitHub Actions! This guide covers CI/CD implementation and optimization for faster, more reliable deployments. Read now!

dmontagu mentioned this issue Jan 22, 2025

Fix an issue with retry counting #749

Merged

sydney-runkle added the bug Something isn't working label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large list[Model] result type causing `pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries` - bigger lists a problem? #734

Large list[Model] result type causing `pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries` - bigger lists a problem? #734

dmyung commented Jan 22, 2025

dmontagu commented Jan 22, 2025 •

edited

Loading

dmyung commented Jan 23, 2025

rasulkireev commented Feb 22, 2025

Large list[Model] result type causing pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries - bigger lists a problem? #734

Large list[Model] result type causing pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries - bigger lists a problem? #734

Comments

dmyung commented Jan 22, 2025

Code

And to test it

dmontagu commented Jan 22, 2025 • edited Loading

dmyung commented Jan 23, 2025

rasulkireev commented Feb 22, 2025

Large list[Model] result type causing `pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries` - bigger lists a problem? #734

Large list[Model] result type causing `pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries` - bigger lists a problem? #734

dmontagu commented Jan 22, 2025 •

edited

Loading