Skip to content

Commit c875e03

Browse files
committed
rpc : update README for cache usage
1 parent ab6ab8f commit c875e03

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

examples/rpc/README.md

+11
Original file line numberDiff line numberDiff line change
@@ -72,3 +72,14 @@ $ bin/llama-cli -m ../models/tinyllama-1b/ggml-model-f16.gguf -p "Hello, my name
7272

7373
This way you can offload model layers to both local and remote devices.
7474

75+
### Local cache
76+
77+
The RPC server can use a local cache to store large tensors and avoid transferring them over the network.
78+
This can speed up model loading significantly, especially when using large models.
79+
To enable the cache, use the `-c` option:
80+
81+
```bash
82+
$ bin/rpc-server -c
83+
```
84+
85+
By default, the cache is stored in the `$HOME/.cache/llama.cpp/rpc` directory and can be controlled via the `LLAMA_CACHE` environment variable.

0 commit comments

Comments
 (0)