Skip to content

last_assistant_item should be updated when item_id of response.audio.delta type changes #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
taisei-ide-layered opened this issue Apr 22, 2025 · 1 comment

Comments

@taisei-ide-layered
Copy link

Hey Twilio team,

I noticed that when using the conversation.item.truncate event in the OpenAI Realtime API (referenced from this repo), you can run into the following error:

Audio content of X ms is already shorter than Y ms

Root Cause

This issue happens because, when truncating the audio playback, the truncate position (audio_end_ms) is sometimes set to a value (e.g., 57340ms) that exceeds the actual length of the audio content generated by OpenAI (e.g., 9250ms). OpenAI throws an error when this happens:

Audio content of 9250ms is already shorter than 57340ms

Quick Fix

Change the following code in the send_twilio method from:

if response_start_timestamp_twilio is None:
    response_start_timestamp_twilio = latest_media_timestamp
    if SHOW_TIMING_MATH:
        print(f"Setting start timestamp for new response: {response_start_timestamp_twilio}ms")

    # Update last_assistant_item safely
    if response.get('item_id'):
        last_assistant_item = response['item_id']

to

if response.get("item_id") and response["item_id"] != last_assistant_item:
      response_start_timestamp_twilio = latest_media_timestamp
      last_assistant_item = response["item_id"]
@morphismz
Copy link

Yes, I was running into the same concern. According to the open ai docs, the audio_end_ms should represent the "Inclusive duration up to which audio is truncated, in milliseconds."

Thus, if we want to set audio_end_ms as elapsed_time = latest_media_timestamp - response_start_timestamp_twilio, then response_start_timestamp_twilio should be the time stamp of when the currently in-progress response started, and null if there is no response in progress. Additionally, latest_media_timestamp should correspond to the timestamp of when the user started interrupting the open AI response.

However, both timestamps are set from the time stamps of incoming Twilio messages and thus seem somewhat divorced from what they ought to be. Am I misunderstanding the open ai docs, the example code, or this a bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants