|
| 1 | +# Windows llama.cpp |
| 2 | + |
| 3 | +Some PowerShell automation to rebuild [llama.cpp](https://github.com/ggerganov/llama.cpp) for a Windows environment. |
| 4 | + |
| 5 | +## Installation |
| 6 | + |
| 7 | +### 1. Install Prerequisites |
| 8 | + |
| 9 | +Download and install the latest versions: |
| 10 | + |
| 11 | +* [CMake](https://cmake.org/download/) |
| 12 | +* [Cuda](https://developer.nvidia.com/cuda-downloads) |
| 13 | +* [Git Large File Storage](https://git-lfs.com) |
| 14 | +* [Git](https://git-scm.com/downloads) |
| 15 | +* [Miniconda](https://conda.io/projects/conda/en/stable/user-guide/install) |
| 16 | +* [Visual Studio 2022 - Community](https://visualstudio.microsoft.com/) |
| 17 | + |
| 18 | +### 2. Clone the repository from GitHub |
| 19 | + |
| 20 | +Clone the repository to a nice place on your machine via: |
| 21 | + |
| 22 | +```Shell |
| 23 | +git clone --recurse-submodules [email protected]:countzero/windows_llama.cpp.git |
| 24 | +``` |
| 25 | + |
| 26 | +### 3. Update the llama.cpp submodule to the latest version (optional) |
| 27 | +This repository can reference an outdated version of the stadtwerk_ssh_authorized_keys repository. To update the submodule to the latest version execute the following. |
| 28 | + |
| 29 | +```Shell |
| 30 | +git submodule update --remote --merge |
| 31 | +``` |
| 32 | + |
| 33 | +Then add, commit and push the changes to make the update available for others. |
| 34 | + |
| 35 | +```Shell |
| 36 | +git add --all; git commit -am "Update llama.cpp submodule to latest commit"; git push |
| 37 | +``` |
| 38 | + |
| 39 | +**Hint:** This is optional because the build script will pull the latest version. |
| 40 | + |
| 41 | +### 4. Create a new Conda environment |
| 42 | + |
| 43 | +Create a new Conda environment for this project with a specific version of Python: |
| 44 | + |
| 45 | +```Shell |
| 46 | +conda create --name llama.cpp python=3.10 |
| 47 | +``` |
| 48 | + |
| 49 | +### 5. Initialize Conda for shell interaction |
| 50 | + |
| 51 | +To make Conda available in you current shell execute the following: |
| 52 | + |
| 53 | +```Shell |
| 54 | +conda init |
| 55 | +``` |
| 56 | + |
| 57 | +**Hint:** You can always revert this via `conda init --reverse`. |
| 58 | + |
| 59 | +### 6. Execute the build script |
| 60 | + |
| 61 | +To build llama.cpp binaries for a Windows environment with CUDA support execute the script: |
| 62 | + |
| 63 | +```PowerShell |
| 64 | +./rebuild_llama.cpp.ps1 |
| 65 | +``` |
| 66 | + |
| 67 | +### 7. Download a large language model |
| 68 | + |
| 69 | +Download a large language model (LLM) with weights in the GGML format into the `./vendor/llama.cpp/models` directory. You can for example download the [open-llama-7b](https://huggingface.co/openlm-research/open_llama_7b) model in a quantized GGML format: |
| 70 | + |
| 71 | +* https://huggingface.co/TheBloke/open-llama-7b-open-instruct-GGML/resolve/main/open-llama-7B-open-instruct.ggmlv3.q4_K_M.bin |
| 72 | + |
| 73 | +**Hint:** See the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for best in class open source LLMs. |
| 74 | + |
| 75 | +## Usage |
| 76 | + |
| 77 | +### Chat |
| 78 | + |
| 79 | +You can now chat with the model: |
| 80 | + |
| 81 | +```PowerShell |
| 82 | +./vendor/llama.cpp/build/bin/Release/main ` |
| 83 | + --model "./vendor/llama.cpp/models/open-llama-7B-open-instruct.ggmlv3.q4_K_M.bin" ` |
| 84 | + --ctx-size 2048 ` |
| 85 | + --n-predict 2048 ` |
| 86 | + --threads 16 ` |
| 87 | + --n-gpu-layers 10 ` |
| 88 | + --reverse-prompt '[[USER_NAME]]:' ` |
| 89 | + --file "./vendor/llama.cpp/prompts/chat-with-vicuna-v1.txt" ` |
| 90 | + --color ` |
| 91 | + --interactive |
| 92 | +``` |
| 93 | + |
| 94 | +### Rebuild llama.cpp |
| 95 | + |
| 96 | +Every time there is a new release of [llama.cpp](https://github.com/ggerganov/llama.cpp) you can simply execute the script to rebuild the binaries and update the Python dependencies: |
| 97 | + |
| 98 | +```PowerShell |
| 99 | +./rebuild_llama.cpp.ps1 |
| 100 | +``` |
| 101 | +https://huggingface.co/TheBloke/open-llama-7b-open-instruct-GGML/resolve/main/open-llama-7B-open-instruct.ggmlv3.q4_K_M.bin |
0 commit comments