Replies: 1 comment
-
Doing e.g. if your sequence is the tokens:
Next time you prompt, it'll store
Which means the batched work (still storing G@6) makes no sense! That's the mistake which the As you've found this is a bit awkward, if you have just one active Conversation you're in an almost unrecoverable situation. Your best option is probably to look at the That's definitely a bit of an API design wart that needs cleaning up! Any ideas that would make it nicer to work with? |
Beta Was this translation helpful? Give feedback.
-
I used to do
conversation.ShiftLeft(count: (int)this.ContextSize / 2)
, but just now I gotCannotModifyWhileRequiresInferenceException
.So what to do in this case? Inference is impossible due to running out of KV cache space. And shifting is impossible due to this exception.
Can I check the space in advance somehow? Any other suggestions?
Beta Was this translation helpful? Give feedback.
All reactions