Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (provider/openai): support file search tool #5141

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

matthewmichel
Copy link

@matthewmichel matthewmichel commented Mar 12, 2025

Extending the openai-tools to support the "file_search" tool call documented by OpenAI here: https://platform.openai.com/docs/guides/tools-file-search?lang=javascript

Note: originally set this up as "file_search_preview", but opted for "file_search" because OpenAI does not have this tool tagged as a preview feature and the tool name in the OpenAI docs is "file_search".

@lgrammel
Copy link
Collaborator

Nice! without parsing their special outputs how will this work in practice?

@matthewmichel
Copy link
Author

When I tested it to see what would happen when adding in a vector_store_id it returned a response as expected through the "text" value with references to the vector store files (indicating to me that it worked to perform the file search).

The "text" value (returned by the new OpenAI Responses API) should work as expected when using the file_search tool. I'll work on adding some handlers for the new "annotations" value that is returned by the Responses API for better clarity.

@matthewmichel
Copy link
Author

matthewmichel commented Mar 12, 2025

@lgrammel I should have checked before responding. The "annotations" property is already available on the OpenAIResponsesAssistantMessage. This should be enough to pass through those annotations through the SDK response.

export type OpenAIResponsesAssistantMessage = {
  role: 'assistant';
  content: Array<{ 
    type: 'output_text'; 
    text: string; 
    annotations?: Array<{
      type: 'file_citation';
      index: number;
      file_id: string;
      filename: string;
    } | {
      type: 'url_citation';
      start_index: number;
      end_index: number;
      url: string;
      title: string;
    }>;
  }>;
};

This matches the OpenAI docs for the file search tool: https://platform.openai.com/docs/guides/tools-file-search

EDIT: I didn't realize I actually made this change 🤦‍♂️ The update to this type has been pushed to this PR.

@matthewmichel
Copy link
Author

Ok made some adjustments to ensure the annotations information is being returned properly through the sdk. Tested this locally and got a nice sources value when trying out with the file_search tool.

sources: [
    {
      sourceType: 'file',
      id: 'PSvUjMxv2hlJ4vuj',
      fileId: 'file-IhP983o4sPmYkgGOtVdureJO',
      filename: 'Artificial Intelligence (AI) Policy.txt'
    }
  ]

I've also updated the next-openai examples to account for either url or file source annotations.

Comment on lines +33 to +58
| {
/**
* A file source. This is returned by file search RAG models.
*/
sourceType: 'file';

/**
* The ID of the source.
*/
id: string;

/**
* The ID of the file.
*/
fileId: string;

/**
* The name of the file.
*/
filename: string;

/**
* Additional provider metadata for the source.
*/
providerMetadata?: LanguageModelV1ProviderMetadata;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a potentially critical change because it also needs to work for anthropic and google. i would need more time to evaluate, most likely want to do this myself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your PR for the computer use tool. Would you like me to adjust this PR to follow those same practices for this file_search tool?

@cosbgn
Copy link

cosbgn commented Mar 13, 2025

Any chance we could get this release as experimental_alpha and then migrate to one which supports all providers? I would love to migrate a lot of Assistants to the new responses api but I'm blocked by the file_search limitation

@MwSpaceLLC
Copy link

Waiting for this file_search adapter,

A game changer for ALL 🙂‍↔️

@IKatsuba
Copy link

This PR will close #5188

@nsenthilkumar
Copy link

Is there any update on this @lgrammel ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants