Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show warning message when last user input get pruned #4816

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Jazzcort
Copy link
Contributor

Description

If the user's last input is pruned due to context overflow, a warning message will be displayed in the chat section, alerting them that some details may have been lost. As a result, the response they receive might be incomplete or inaccurate due to the truncated input.
Granite-Code/granite-code#22

Screenshots

show-warning-in-chat

Testing instructions

Set the model’s context length to a small value (e.g., 512) and continue asking questions until the limit is reached. Once exceeded, a warning message will appear at the bottom of the chat section, indicating that some input may have been truncated. Deleting previous messages will remove the warning.

If the user's last input is pruned due to context overflow,
a warning message will be displayed in the chat section,
alerting them that some details may have been lost. As a
result, the response they receive might be incomplete or
inaccurate due to the truncated input.
Granite-Code/granite-code#22
@Jazzcort Jazzcort requested a review from a team as a code owner March 25, 2025 18:21
@Jazzcort Jazzcort requested review from RomneyDa and removed request for a team March 25, 2025 18:21
Copy link

netlify bot commented Mar 25, 2025

Deploy Preview for continuedev ready!

Name Link
🔨 Latest commit cdf373c
🔍 Latest deploy log https://app.netlify.com/sites/continuedev/deploys/67e2f40fcf29db0008a7ecdf
😎 Deploy Preview https://deploy-preview-4816--continuedev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Collaborator

@RomneyDa RomneyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jazzcort this could be a great addition, could you explore solutions that avoid injecting a new warning message type into the chat messages, along with a more subtle warning UI?

@Jazzcort
Copy link
Contributor Author

@RomneyDa I'll try to find another way to send the warning message back to the webview.

@RomneyDa
Copy link
Collaborator

RomneyDa commented Mar 28, 2025

@Jazzcort I'd be interested in having this sort of "stream warning" idea as well so fell free to bounce approach/ideas here before spending too much time on them! I think there could be several different approaches where the warnings aren't persisted to chat history, maybe passing them with streams but with a "warning:" field that is captured in redux streamUpdate and temporarily added to UI or something. Let me know thoughts

@Jazzcort
Copy link
Contributor Author

I've already implemented this second approach in the following branch: Jazzcort/warn-when-truncate-last-msg-v2.

Instead of sending the warning through the stream, I used the messenger system to deliver the warning message. With this approach, I make the pruning behavior occur before calling streamChat so I can reach the messenger reference. The advantage of this approach is that users receive the warning message before the streamed response rather than after it has finished.

Regarding the "warning:" field, are you suggesting adding it to AssistantChatMessage? I think that could work as well! I'm open to either approach—whichever aligns better with the project's design. We can also discuss the UI implementation afterward.

@owtaylor
Copy link
Contributor

Another idea would be to extend @Jazzcort's last approach and have two separate calls from webview => core.

  1. llm/pruneChat => {prunedMessages, warning?: string}
  2. llm/streamChat

or something like that. (llm.streamChat could still prune itself when it's not being invoked from the chatview.)

This would avoid having to worry about the interaction between warnings and streaming. It could also be potentially useful for some other things:

  • prefix-caching sensitive pruning (project writeup: Context stability - taking advantage of prefix caching Granite-Code/granite-code#96)
  • having some way to reveal the pruned messages in the UI. I'm not at all sure that this is a good idea - the user can look at logs - but I do feel that it can be deceptive to have a rich chat history with no indication that only a tiny fraction of it might be actually be sent to the model.

@Jazzcort
Copy link
Contributor Author

I agree with @owtaylor's suggestion. Calling llm/pruneChat before llm/streamChat not only helps manage context length but also provides control over whether llm/streamChat is called at all. Users might not focus too much on the last response when they see the warning message.

Additionally, leveraging Context stability - taking advantage of prefix caching is a great strategy. It can enhance the user experience by reducing response time when the context limit is reached. @RomneyDa If everything sounds great, I'll start working on it.

@RomneyDa
Copy link
Collaborator

RomneyDa commented Apr 1, 2025

Planning to look into this tomorrow midday!

@RomneyDa
Copy link
Collaborator

RomneyDa commented Apr 2, 2025

@Jazzcort @owtaylor would agree that two separate calls is a good approach, especially having the warning up front would be great and it won't effect all the other uses of streamChat in core, etc. llm/pruneChat could work, I'd be interested in approaches to this that don't do the pruning twice, but also use the same streamChat function. Perhaps some kind of alreadyPruned boolean passed to llm.streamChat. Counting tokens can be a bit expensive but not bad.

@sestinj tagging since touches core streaming

@Jazzcort
Copy link
Contributor Author

Jazzcort commented Apr 3, 2025

I'm planing to implement llm/compileChat which call compileChatHistory to prune the chat messages we pass and return the compiled chat messages and a boolean that indicates whether we should warn users or not. I'll also add a boolean parameter to llm/streamChat so we won't do the pruning twice. How do you guys think? @owtaylor @RomneyDa @sestinj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants