Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rich LLM logging #4866

Merged
merged 7 commits into from
Apr 8, 2025
Merged

Rich LLM logging #4866

merged 7 commits into from
Apr 8, 2025

Conversation

owtaylor
Copy link
Contributor

Description

Instead of treating LLM logging as simply writing lines of text sequentially, emit a stream of typed data objects that include rich information about the interaction, such as timestamps, message types, tool calls, etc.

These are then collected by a new custom webview in the vscode extension, allowing a user (developer) to browse through the different interactions and view them in detail. In progress interactions are now streamed live, instead of being held back to avoid concurrency issues.

There is a file logging implementation on top of this that includes the rich information and tries to be prettier and easier to read than the old logging; this is used for the binary and hence for Jetbrains for now, though exposing the rich webview in Jetbrains could be done too.

Checklist

  • The relevant docs, if any, have been updated or created - I haven't done this yet - this could be done as a follow-up PR or I can update this PR.
  • The relevant tests, if any, have been updated or created - this shouldn't have any affect on tests.

Screenshots

Screencast.From.2025-03-27.16-03-32.webm

[ For visual changes, include screenshots. ]

Testing instructions

  • Enable the "Enable Model Logging" preference
  • Switch to the "LLM Log" view in the bottom panel

Review guide and notes

This is a rather large set of changes, some hints:

  • The interface to the new logging can be found in core/index.d.ts
  • The main non-trivial change to existing code is the changes to BaseLLM in core/llm/index.ts - this is where the actual logging is added - there is as lot of boilerplate that is replicated between streamChat, streamComplete, etc. I have some ideas how this could be cleaned up, but it would make this patch less straightforward to review. One subtle thing in the changes here is the adding finally {} blocks to catch cancellation via a call generator.return() - this should probably be documented, but would be easier if it wasn't duplicated in a bunch of places.
  • logFormatter.ts and logFormatter.test.ts are pretty big and somewhat complex, but don't have much relevance to the overall code-flow - they just adapt the new log interface into a stream.
  • Again the code in gui/src/components/llmLog/* is rather big, but it's "just" the react GUI. If you want to understand how it works, gui/src/hooks/useLLMLog.ts is a good place to start - it handles converting the linear stream of log items into state that can be efficiently rendered.

Other notes

  • If the preference off, then there's pretty minimal overhead - we create the log items, but they are immediately discarded
  • If the preference in on, but the log view is not visible, then the only thing that happens is that the log items are collected into an array. If someone is editing with autocompletion on for a long time, the memory used by the array could be quite large because we're saving all the context sent on every keystroke. I suspect that it will be 100's of MB rather than 100's of GB's but some measurement or exact calculation would be useful. If this ends up being a problem, saving only the last 100 LLM interactions or something like that is probably the right approach. Converting to JSON and compressing would also be possible.
  • Performance when the log view is visible is going to be slower, but I don't think there's any gross inefficiency. One thing that could definitely be done is batching updates - only sending a bunch of items from the core to the log view GUI every 0.1s or something. If the history is very large, switching between the log view and other views might be slow since retainContextWhenHidden is not set (to reduce overhead when hidden). Again, limiting the amount of history saved would help.
  • This reverts Citations made visible in the output for Perplexity Sonar research models #4511 (cc: @ferenci84) because citations are not plumbed through the type system sufficiently to put them into the new log stream. Once citation support is more developed, it will be easy to add it here.

Possible future extensions

  • Menu items to save an interaction in text form and/or as a raw JSON dump.
  • Some representation of timestamps within the output - tooltips, or annotations at the edge, for example
  • Include the role that the model is being used for - edit/chat/etc.
  • Optionally log and include embedding model calls used for indexing
  • Tricky: when streamFim is being used for autocompletion, visualize the affect that filtering has on the stream - show what gets filtered out.

@owtaylor owtaylor requested a review from a team as a code owner March 27, 2025 20:07
@owtaylor owtaylor requested review from RomneyDa and removed request for a team March 27, 2025 20:07
Copy link

netlify bot commented Mar 27, 2025

Deploy Preview for continuedev ready!

Name Link
🔨 Latest commit 6e54e7f
🔍 Latest deploy log https://app.netlify.com/sites/continuedev/deploys/67f42f69c7030a000718759f
😎 Deploy Preview https://deploy-preview-4866--continuedev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Collaborator

@RomneyDa RomneyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owtaylor just pair-reviewed with @Patrick-Erichsen, this is a sweet addition. Code is nicely inserted and agreed with most all decisions.

  • regarding including llmLogger react app code in gui, might move some files around as nitpicks later but overall we think you made the right decisions/compromises here, convenient to have same build output location for build script copying etc.
    -As you mentioned not too bad to expose jetbrains log webview if we want, and not worth doing both for large experimental feature immediately, especially with opt-in behavior and the fact that improves default logging for jetbrains anyways

@Patrick-Erichsen and I will pull and smoke test a bit more, then should be good to merge

Copy link
Collaborator

@RomneyDa RomneyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owtaylor some second thoughts, would agree with your concerns about memory, could you limit the logs array to maybe 10s of thousands to prevent memory from being a concern? Maybe start shifting at that limit

Also less important but could you do a quick review of if using HeroIcons is reasonable rather than @vscode/codicons icons? not a dealbreaker but should be cross platform and perhaps value of matching the sidebar icon set and not using vs code icons in jetbrains in the future >= value of not matching native vs code icons. Didn't look too much into it so if the hero icons are super lacking don't worry about it

Copy link
Collaborator

@Patrick-Erichsen Patrick-Erichsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent implementation! Playing with this locally was really fun. I feel like this is quickly going to become something we wonder how we lived without having for so long 😅

+1 to the feedback Dallin had above, went ahead and approved however since I think it's mergeable as is.

A few minor UI nitpicks:

  • Chatted with @TyDunn on the name for the tab - can we call it "Continue Console" rather than "LLM Logs"?
  • Could we use the phrase "Autocomplete" and "Edit" rather than "Fim" and "Completion"? Mostly to align with terminology in our docs.
Screenshot 2025-03-31 at 6 32 15 PM

Lastly - regarding the failing tests, we still have some issues with flake, so whenever you're ready to merge this just let me know and I'll re-run them til they pass.

@owtaylor
Copy link
Contributor Author

owtaylor commented Apr 1, 2025

Thanks @RomneyDa and @Patrick-Erichsen for the review!!

@owtaylor some second thoughts, would agree with your concerns about memory, could you limit the logs array to maybe 10s of thousands to prevent memory from being a concern? Maybe start shifting at that limit

I'll come up with an additional commit to add a limit Probably limiting the number of interactions rather than the number of item, since that will be easier for communication with the view.

Also less important but could you do a quick review of if using HeroIcons is reasonable rather than @vscode/codicons icons? not a dealbreaker but should be cross platform and perhaps value of matching the sidebar icon set and not using vs code icons in jetbrains in the future >= value of not matching native vs code icons. Didn't look too much into it so if the hero icons are super lacking don't worry about it

Will do.

Chatted with @TyDunn on the name for the tab - can we call it "Continue Console" rather than "LLM Logs"?

Sure - "LLM Log" has some advantage of compactness and avoiding overflow of the panel tabs, but the leap from LLM => Continue is likely not obvious to all users.

I suppose then "continue.enableModelLogging" changes "continue.enableConsole" and so forth. Will make a stab at it!

Could we use the phrase "Autocomplete" and "Edit" rather than "Fim" and "Completion"? Mostly to align with terminology in our docs.

Hmm, @shermanhuman made a suggestion that what should appear in the list is the role rather than the "method" - I think that makes a lot of sense, and would align things with the terminology that you want, but does require the role to be passed through to the ILLM methods (probably in the options) - OK if I look at that as a follow-up?

@Patrick-Erichsen
Copy link
Collaborator

Sure - "LLM Log" has some advantage of compactness and avoiding overflow of the panel tabs, but the leap from LLM => Continue is likely not obvious to all users.

That was essentially the concern, e.g. a user seeing the tab and being unsure where it came from. Maybe a minor concern since it's opt-in, but we're about the same length as DEBUG CONSOLE 😁

Hmm, @shermanhuman made a suggestion that what should appear in the list is the role rather than the "method" - I think that makes a lot of sense, and would align things with the terminology that you want, but does require the role to be passed through to the ILLM methods (probably in the options) - OK if I look at that as a follow-up?

Definitely, I'm okay with that being a follow up 👍

@owtaylor
Copy link
Contributor Author

owtaylor commented Apr 1, 2025

This is a good callout, totally okay to leave it as continueLogView.

Too late :-) - using Console in the code did fix some awkwardness of llmLog as a identifier/filename component.

New version:

  • View renamed to Console, corresponding changes to filenames/configuration settings/commands
  • Switched to heroicons
  • Limited to 50 most recent interactions
  • Rebased over a trivial conflict

I think that addresses everything noted.

Lastly - regarding the failing tests, we still have some issues with flake, so whenever you're ready to merge this just let me know and I'll re-run them til they pass.

Let's see how this does. The failure in the last run had an odd screenshot - the sidebar ended up in the panel. I don't think it had anything to do with this because setting for the Log (console) will be off by default.

@Patrick-Erichsen
Copy link
Collaborator

Hmmm, re-running the tests and looks like the same issue

Keyboard Shortcuts Fresh VS Code window → sidebar closed → cmd+L with no code highlighted → opens sidebar and focuses input → cmd+L closes sidebar
Keyboard Shortcuts Fresh VS Code window → sidebar closed → cmd+L with no code highlighted → opens sidebar and focuses input → cmd+L closes sidebar

In the e2e tests we do manually move the extension to the right sidebar: https://github.com/continuedev/continue/blob/main/extensions/vscode/e2e/actions/GUI.actions.ts#L15

I'm wondering if some of the iframe logic there has an assumption that there will only be a single iframe: https://github.com/continuedev/continue/blob/main/extensions/vscode/e2e/actions/GUI.actions.ts#L32

If you're able to run the e2e tests locally, I would just bump the timeout for that test to some arbitrarily long number (100000000) and that will allow you to poke around the IDE during the test to see if the feature is enabled or not.

Feel free to timebox your troubleshooting to a reasonable length and then pass it over to us, the e2e tests are a bit tricky so I'm sure we'll be able to figure it out if it feels like you're going down a rabbit hole.

@owtaylor
Copy link
Contributor Author

owtaylor commented Apr 7, 2025

In the e2e tests we do manually move the extension to the right sidebar: https://github.com/continuedev/continue/blob/main/extensions/vscode/e2e/actions/GUI.actions.ts#L15

The problem with moving the sidebar to the side is that the second quick pick for "View: Move View" lists all View Containers, including the empty "Continue Log" one, this upsets the hardcoded index of 14 (which is the Copilot Chat view container?)

Changing selectQuickPick(4) => selectQuickPick("Continue") and selectQuickPick(14) => selectQuickPick("New Secondary Side Bar Entry") fixes the sidebar move, and seems to make the "GUI" and "KeyboardShortcuts" work locally for me. I'll include that in my branch and fix the conflicts.

owtaylor added 6 commits April 7, 2025 14:04
When we catch/log/rethrow AbortError, throw it as itself, rather than
throw Error(e.message) so that outer callers can detect it by the
exception name.
We can't add citations to the rich log until we have them plumbed
through with the correct data types. That will happen when they
get added to the UI. Since the only appearance so far was in the
old text log, just revert the addition for now.

This reverts commit 6bc0bfc.
Instead of treating LLM logging as simply writing lines of text
sequentially, emit a stream of typed data objects that include
rich information about the interaction, such as timestamps,
message types, tool calls, etc.

These are then collected by a new custom webview in the vscode
extension (the "Continue Console"), allowing a user or developer
to browse through the different interactions and view them in detail.
In progress interactions are now streamed live, instead of being held back
to avoid concurrency issues.

This webview could be also exposed for IntelliJ or file logging
could be reimplemented on top of the new framework.
Add a streaming implementation of rich LLM logging to use
if we don't have the webview connected; it converts the
LLMInteractionItems into a human-readable stream with
timestamps, handling of overlapping interactions, and so
forth.

Use this for the binary.
Only retain and display the 50 most recent interactions; this prevents
the stored items from growing without bound if the user is using
the vscode instance to edit for a long time.
Selecting the 14th view container ran into problems when an
(emty) Continue Console view container was added. This also
changes it to move to new secondary sidebar container rather than
reusing one from Copilot.
@owtaylor
Copy link
Contributor Author

owtaylor commented Apr 7, 2025

The failure on this run:

 Installing extensions...
Error while installing extensions: AggregateError [ETIMEDOUT]: 
AggregateError [ETIMEDOUT]: 
Error: Command failed: ELECTRON_RUN_AS_NODE=1 "/home/runner/work/continue/continue/extensions/vscode/e2e/storage/VSCode-linux-x64/code" "/home/runner/work/continue/continue/extensions/vscode/e2e/storage/VSCode-linux-x64/resources/app/out/cli.js" --force --install-extension "ms-vscode-remote.remote-containers" --extensions-dir=/home/runner/work/continue/continue/extensions/vscode/e2e/.test-extensions

is actually a flake.

@Patrick-Erichsen
Copy link
Collaborator

@tomasz-stefaniak will hopefully be getting to the bottom of the test flake this week

@Patrick-Erichsen
Copy link
Collaborator

All tests passing!

25ee72e#diff-a6d8ec1e9272f89aa61d2e14627eef192f6b5ed76d7a6f1647991e56d9ff5ac4R21

Thanks for cleaning this up, that was bound to break at some point so apologies you were the first one to have to deal with it.

--

I think we're good to merge here, if you're ready as well let's hit the button 👍

@Patrick-Erichsen
Copy link
Collaborator

Handled a quick merge conflict that just arose

@owtaylor
Copy link
Contributor Author

owtaylor commented Apr 7, 2025

I think we're good to merge here, if you're ready as well let's hit the button 👍

Good to go from my perspective - I'll keep an eye out for any problems that are reported (or let me know) and will look at adding some enhancements like showing the role over the next few weeks.

@RomneyDa
Copy link
Collaborator

RomneyDa commented Apr 8, 2025

@owtaylor thanks for cleaning things up and working in the requested changes! Looks great. Merging.

@RomneyDa RomneyDa merged commit 1ad3d7f into continuedev:main Apr 8, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants