KoboldCpp AMD RDNA2 Optimised Setup

This repository contains optimised settings for running KoboldCpp with AMD GPUs, specifically tested with the AMD Radeon RX 6700 XT and Llama 2 7B model.

GPU Architecture Compatibility

This setup was specifically developed and tested with an AMD Radeon RX 6700 XT (RDNA2 architecture). Whilst newer RDNA3 and RDNA4 GPUs might handle dual GPU setups better with Vulkan, RDNA2 GPUs can experience issues when running alongside NVIDIA GPUs. This solution provides a reliable way to:

Ensure stable operation on RDNA2 GPUs
Avoid Vulkan-related conflicts in dual GPU setups
Provide consistent performance regardless of GPU architecture

Dual GPU Setup Benefits

This setup is particularly useful for systems with multiple GPUs, especially when you have both AMD and NVIDIA GPUs installed. In such configurations, other applications might try to use both GPUs with Vulkan, which can lead to conflicts and failures. By using the ROCm version of KoboldCpp, we ensure that:

The application specifically targets the AMD GPU
Avoids conflicts with NVIDIA GPU operations
Prevents Vulkan-related issues in dual GPU setups
Provides stable performance on the AMD GPU

Prerequisites

AMD GPU with ROCm support (tested with RX 6700 XT)
Windows 10/11
Python 3.x
KoboldCpp ROCm version

Getting Started

Clone the repository:

git clone https://gitlab.com/CodenameCookie/koboldcpp-amd-rdna2.git

Navigate to the project directory:
```
cd koboldcpp-amd-rdna2
```
Alternatively, you can open the project in Visual Studio Code:
- Open Visual Studio Code
- Go to File > Open Folder
- Navigate to where you cloned the repository (e.g., C:\Users\YourUsername\Documents\koboldcpp-amd-rdna2)
- Click "Select Folder"
- Open the integrated terminal in VS Code using Ctrl + ` or View > Terminal

Installation

Check if ROCm is already installed:
- Open PowerShell and run rocm-smi to check if ROCm is installed
- If the command is recognized, ROCm is already installed
- If not, proceed with ROCm installation
Install ROCm for Windows (if not already installed):
- Download and install ROCm from AMD's official website
- Follow the installation guide for Windows
- Make sure your GPU is supported by the installed ROCm version
Download KoboldCpp ROCm:
- Download the latest release from YellowRoseCx/koboldcpp-rocm
- For Windows: Download koboldcpp_rocm.exe (single file) or koboldcpp_rocm_files.zip
- If using the zip file, extract it to your desired location
- Place koboldcpp_rocm.exe in the root directory of this project
Download the Llama 2 7B Chat model:
- Create a models directory if it doesn't exist:
```
mkdir -Force models
```
- Download the GGUF version of Llama 2 7B Chat using PowerShell:
```
Invoke-WebRequest -Uri "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf" -OutFile "models\llama-2-7b-chat.gguf"
```
- Note: The download is approximately 4GB and may take some time depending on your internet connection
- Alternative: You can manually download the model from TheBloke's HuggingFace repository and place it in the models folder

Model Information

Model: Llama 2 7B Chat
Format: GGUF
Size: 3.80 GiB
Context Size: 2048
Total Layers: 32

Performance Optimization

After testing various configurations, we found the optimal settings for the RX 6700 XT:

.\koboldcpp_rocm.exe --model .\models\llama-2-7b-chat.gguf --host 127.0.0.1 --port 5001 --contextsize 2048 --gpulayers 30 --blasbatchsize 2048 --blasthreads 4 --highpriority --usecublas mmq

Key Parameters Explained

--gpulayers 30: Offloads 30 layers to GPU (optimal for 32-layer model)
--blasbatchsize 2048: Maximum batch size for better GPU utilization
--blasthreads 4: Reduced thread count to prevent CPU bottlenecks
--highpriority: Improves CPU allocation
--usecublas mmq: Enables Matrix Multiplication Quantization through hipBLAS

Performance Results

Previous configurations:

43 layers: 17.69s, 2.94 tokens/s
27 layers: 15.76s, 3.05 tokens/s
20 layers: 16.54s, 3.14 tokens/s

Optimized configuration:

30 layers with MMQ: 6.29s, 7.79 tokens/s

Usage

Stop any existing KoboldCpp processes (only if you have run it already):

Get-Process -Name koboldcpp_rocm -ErrorAction SilentlyContinue | Stop-Process -Force

Start KoboldCpp with optimized settings:

.\koboldcpp_rocm.exe --model .\models\llama-2-7b-chat.gguf --host 127.0.0.1 --port 5001 --contextsize 2048 --gpulayers 30 --blasbatchsize 2048 --blasthreads 4 --highpriority --usecublas mmq

Test the performance:

python test_inference.py

Notes

The model requires approximately 3.80 GiB of VRAM
The optimised settings use hipBLAS for better GPU utilisation
High priority mode is recommended for better CPU allocation
The context size of 2048 provides a good balance between performance and memory usage

Contributing

We welcome contributions to improve this setup! Here's how you can help:

Reporting Issues

Please check if the issue has already been reported
Include your system specifications (GPU model, ROCm version, etc.)
Provide detailed steps to reproduce the issue
Include any error messages or logs

Pull Requests

Fork the repository
Create a new branch for your feature (git checkout -b feature/amazing-feature)
Make your changes
Test thoroughly with your AMD GPU setup
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Style

Follow the existing code style
Keep code comments clear and concise
Update documentation for any new features

Testing

Test changes with different AMD GPU models
Verify performance improvements
Check compatibility with different ROCm versions
Ensure no regressions in existing functionality

License

By contributing, you agree that your contributions will be licensed under the same terms as the project.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check_ollama.py		check_ollama.py
test_inference.py		test_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KoboldCpp AMD RDNA2 Optimised Setup

GPU Architecture Compatibility

Dual GPU Setup Benefits

Prerequisites

Getting Started

Installation

Model Information

Performance Optimization

Key Parameters Explained

Performance Results

Usage

Notes

Contributing

Reporting Issues

Pull Requests

Code Style

Testing

License

About

Releases

Packages

Languages

License

CodenameCookie/koboldcpp-amd-rdna2-optimised-setup

Folders and files

Latest commit

History

Repository files navigation

KoboldCpp AMD RDNA2 Optimised Setup

GPU Architecture Compatibility

Dual GPU Setup Benefits

Prerequisites

Getting Started

Installation

Model Information

Performance Optimization

Key Parameters Explained

Performance Results

Usage

Notes

Contributing

Reporting Issues

Pull Requests

Code Style

Testing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages