Toggle to Disable dynamic Context Window ( dynamic calculated num_ctx ) #434
reneil1337
started this conversation in
Ideas
Replies: 1 comment
-
In addition to being able to set this to false, having the ability to also set num_ctx to a specific value would also solve this issue and give users full control over model loading/unloading. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is your feature request related to a problem? Please describe.
With each document thats being analyzed, the llm is unloaded from my ollama instance. This is due to the api calls via different num_ctx param each time the ollama llm is put to work. I'd like to be able to disable this and let paperless-ai call my ollama instance without setting the num_ctx in the api call which would allow to analyze all documents without reloading the llm each time. This would massively increase performance as it allows all docs to the analyzed in a single flow without unloading/realoading the llm between each document.
Describe the solution you'd like
I'd love to have a dropdown/toggle in the settings labeled "dynamic context window" which can be set to false. Switching that setting from true to false would result paperless-ai to call the ollama api with the default context window thats defined on the server side, preventing the constant llm reloading.
Additional context
I run llama 3.3 70b on my ollama server. Unloading and reloading a model of that size takes longer than running the query requested by paperless-ai. Also paperless-ai interferes with other apps like OpenWebUI and other services that use Ollama with the default context window. All of my other applications use ollama with the default context window, never unloading the llm from the VRAM. Once paperless-ai starts running, all those other services loose connection to the llm.
Beta Was this translation helpful? Give feedback.
All reactions