Increase prefix cache memory space #15410
M0rpheus-0
announced in
Q&A
Replies: 1 comment 2 replies
-
I have the same use-case. @M0rpheus-0 They recently moved discussions to a forum, I dont know why. So you wont get an answer here |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
Is there a way to increase the available memory vLLM uses to store prefixes of past prompts, so that it can cache more prompts?
More so, could we share/allocate some of the CPU RAM to do so?
I am working with 2xH100, and ample RAM to spare.
If not can I at least force a prompt to stay cached?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions