feat: add transcription with `experimental_transcribe` #5496

haydenbleasel · 2025-04-02T02:31:44Z

This PR creates the foundations for migrating Orate into the AI SDK, starting with transcribing audio. The changes span across documentation, implementation, and testing to support the new transcribe function.

Documentation Updates:

content/docs/03-ai-sdk-core/36-transcription.mdx: Added a new documentation page for the transcription feature, including usage examples, settings, and error handling.
content/docs/03-ai-sdk-core/index.mdx: Updated the index to include a link to the new transcription documentation.
content/docs/07-reference/01-ai-sdk-core/11-transcribe.mdx: Added an API reference page for the transcribe function, detailing parameters, return values, and examples.
content/providers/01-ai-sdk-providers/02-openai.mdx: Documented the OpenAI transcription models and their capabilities.

New API Implementation:

packages/ai/core/generate-transcript/generate-transcript-result.ts: Defined the TranscriptionResult interface to structure the transcription output.
examples/ai-core/src/generate-transcript/openai.ts: Added an example script demonstrating how to use the transcription feature with OpenAI.

Test Coverage:

packages/ai/core/generate-transcript/generate-transcript.test.ts: Implemented tests for the transcribe function, covering argument handling, warnings, transcript generation, and error scenarios.

These changes collectively add a robust transcription feature to the AI SDK, complete with detailed documentation and thorough testing.

Copilot

Pull Request Overview

This PR introduces the foundations for transcribing audio by adding a new experimental_generateTranscript function along with the necessary implementation, tests, documentation, and integration in the AI SDK.

Added comprehensive audio input conversion utilities.
Implemented and integrated the generateTranscript function with the transcription provider.
Enhanced test coverage and documentation for the new transcription capabilities.

Reviewed Changes

Copilot reviewed 27 out of 31 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
packages/provider-utils/src/convert-audio-input.ts	Adds audio input conversion supporting multiple formats.
packages/openai/src/openai-transcription-settings.ts	Defines types for transcription settings.
packages/openai/src/openai-transcription-model.ts	Implements transcription model logic and API interaction.
packages/openai/src/openai-transcription-model.test.ts	Provides tests for transcription model behavior.
packages/openai/src/openai-provider.ts	Integrates transcription into the OpenAI provider.
packages/ai/errors/no-transcript-generated-error.ts	Introduces a custom error for missing transcript cases.
packages/ai/core/types/transcription-model.ts	Defines types for transcription models (documentation comment needs updating).
packages/ai/core/generate-transcript/*	Implements the generateTranscript function, result types, and tests.
examples/ai-core/src/generate-transcript/openai.ts	Adds an example script for using the transcription feature.

Files not reviewed (4)

content/docs/03-ai-sdk-core/65-transcription.mdx: Language not supported
content/docs/07-reference/01-ai-sdk-core/11-generate-transcript.mdx: Language not supported
content/providers/01-ai-sdk-providers/02-openai.mdx: Language not supported
packages/ai/tsconfig.vitest-temp.json: Language not supported

Comments suppressed due to low confidence (2)

packages/ai/core/types/transcription-model.ts:7

The comment incorrectly references 'Image model' instead of 'Transcription model'. Please update the comment to accurately describe the transcription model.

 * Image model that is used by the AI SDK Core functions.

packages/openai/src/openai-transcription-model.ts:48

[nitpick] Consider using a dynamic filename derived from the original audio input rather than the hard-coded 'audio.wav' to improve accuracy and flexibility in file handling.

    formData.append('file', file, 'audio.wav');

content/docs/07-reference/01-ai-sdk-core/11-generate-transcript.mdx

content/docs/03-ai-sdk-core/65-transcription.mdx

content/providers/01-ai-sdk-providers/02-openai.mdx

examples/ai-core/src/generate-transcript/openai.ts

packages/ai/core/generate-transcript/generate-transcript-result.ts

packages/ai/core/generate-transcript/generate-transcript.ts

packages/ai/core/generate-transcript/index.ts

packages/ai/tsconfig.vitest-temp.json

packages/provider-utils/src/convert-audio-input.ts

packages/provider-utils/src/index.ts

packages/provider-utils/src/test/unified-test-server.ts

.changeset/happy-kangaroos-roll.md

Co-authored-by: Nico Albanese <[email protected]>

vercel bot deployed to Preview April 2, 2025 02:42 View deployment

haydenbleasel requested review from lgrammel, Copilot and shaper April 2, 2025 03:25

haydenbleasel self-assigned this Apr 2, 2025

Copilot AI reviewed Apr 2, 2025

View reviewed changes

vercel bot deployed to Preview April 2, 2025 03:26 View deployment

vercel bot deployed to Preview April 2, 2025 03:28 View deployment

haydenbleasel marked this pull request as ready for review April 2, 2025 03:38