Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add transcription with experimental_transcribe #5496

Merged
merged 104 commits into from
Apr 8, 2025
Merged

feat: add transcription with experimental_transcribe #5496

merged 104 commits into from
Apr 8, 2025

Conversation

haydenbleasel
Copy link
Contributor

@haydenbleasel haydenbleasel commented Apr 2, 2025

This PR creates the foundations for migrating Orate into the AI SDK, starting with transcribing audio. The changes span across documentation, implementation, and testing to support the new transcribe function.

Documentation Updates:

New API Implementation:

Test Coverage:

These changes collectively add a robust transcription feature to the AI SDK, complete with detailed documentation and thorough testing.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the foundations for transcribing audio by adding a new experimental_generateTranscript function along with the necessary implementation, tests, documentation, and integration in the AI SDK.

  • Added comprehensive audio input conversion utilities.
  • Implemented and integrated the generateTranscript function with the transcription provider.
  • Enhanced test coverage and documentation for the new transcription capabilities.

Reviewed Changes

Copilot reviewed 27 out of 31 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/provider-utils/src/convert-audio-input.ts Adds audio input conversion supporting multiple formats.
packages/openai/src/openai-transcription-settings.ts Defines types for transcription settings.
packages/openai/src/openai-transcription-model.ts Implements transcription model logic and API interaction.
packages/openai/src/openai-transcription-model.test.ts Provides tests for transcription model behavior.
packages/openai/src/openai-provider.ts Integrates transcription into the OpenAI provider.
packages/ai/errors/no-transcript-generated-error.ts Introduces a custom error for missing transcript cases.
packages/ai/core/types/transcription-model.ts Defines types for transcription models (documentation comment needs updating).
packages/ai/core/generate-transcript/* Implements the generateTranscript function, result types, and tests.
examples/ai-core/src/generate-transcript/openai.ts Adds an example script for using the transcription feature.
Files not reviewed (4)
  • content/docs/03-ai-sdk-core/65-transcription.mdx: Language not supported
  • content/docs/07-reference/01-ai-sdk-core/11-generate-transcript.mdx: Language not supported
  • content/providers/01-ai-sdk-providers/02-openai.mdx: Language not supported
  • packages/ai/tsconfig.vitest-temp.json: Language not supported
Comments suppressed due to low confidence (2)

packages/ai/core/types/transcription-model.ts:7

  • The comment incorrectly references 'Image model' instead of 'Transcription model'. Please update the comment to accurately describe the transcription model.
 * Image model that is used by the AI SDK Core functions.

packages/openai/src/openai-transcription-model.ts:48

  • [nitpick] Consider using a dynamic filename derived from the original audio input rather than the hard-coded 'audio.wav' to improve accuracy and flexibility in file handling.
    formData.append('file', file, 'audio.wav');

@lgrammel lgrammel merged commit c21fa6d into main Apr 8, 2025
7 checks passed
@lgrammel lgrammel deleted the orate branch April 8, 2025 06:51
lgrammel pushed a commit that referenced this pull request Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants