Skip to content

Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg) #12956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stduhpf opened this issue Apr 15, 2025 · 2 comments · Fixed by #13031
Closed

Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg) #12956

stduhpf opened this issue Apr 15, 2025 · 2 comments · Fixed by #13031

Comments

@stduhpf
Copy link
Contributor

stduhpf commented Apr 15, 2025

Name and Version

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
version: 5139 (84778e9)
built with MSVC 19.43.34809.0 for x64

Operating systems

Windows

GGML backends

Vulkan

Hardware

Ryzen 5900X + Rx 5700XT + Rx 6800

Models

DeepSeek-V2-Lite-Chat.IQ4_NL.gguf

Problem description & steps to reproduce

The program always crashes after generating a single token. Prompt processing seem to work fine.

.\llama-cli.exe -m .\models\DeepSeek-V2-Lite-Chat.IQ4_NL.gguf -ngl 99 -t 12 -p "Hello"

First Bad Commit

most likely daa4228, I haven't done a proper bisect, but d6d2c2a was working.

Relevant log output

User: Hello

Assistant: HelloC:\Users\user\llama.cpp\ggml\src\ggml-vulkan\ggml-vulkan.cpp:ggml-vulkan.cpp:6078: GGML_ASSERT(ggml_vk_op_supports_incontiguous(op) || ggml_vk_dim01_contiguous(src0)) failed
@stduhpf stduhpf changed the title Eval bug: Deepseek V2 no longer working with Vulkan (assert fail during tg) Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg) Apr 15, 2025
@stduhpf
Copy link
Contributor Author

stduhpf commented Apr 15, 2025

Also when using a batch size of 1, the crash happens during prompt processing.

@stduhpf
Copy link
Contributor Author

stduhpf commented Apr 15, 2025

Ok I found a fix. Now it's working, but I'm also noticing a small, but significant performance regression with prompt processing speed...
d6d2c2a:

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |    sm |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ----: | ------------: | -------------------: |
| deepseek2 16B IQ4_NL - 4.5 bpw |   8.36 GiB |    15.71 B | Vulkan,RPC |  99 |   row |        pp4096 |        328.76 ± 0.44 |
| deepseek2 16B IQ4_NL - 4.5 bpw |   8.36 GiB |    15.71 B | Vulkan,RPC |  99 |   row |        tg1024 |         91.81 ± 0.59 |

build: d6d2c2ab (5133)

With my fix on top of master: (7a8be3a):

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |    sm |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ----: | ------------: | -------------------: |
| deepseek2 16B IQ4_NL - 4.5 bpw |   8.36 GiB |    15.71 B | Vulkan,RPC |  99 |   row |        pp4096 |        323.89 ± 0.93 |
| deepseek2 16B IQ4_NL - 4.5 bpw |   8.36 GiB |    15.71 B | Vulkan,RPC |  99 |   row |        tg1024 |         92.77 ± 0.19 |

build: 7a8be3ab (5140)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant