Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg) #12956

stduhpf · 2025-04-15T09:57:38Z

Name and Version

Operating systems

Windows

GGML backends

Vulkan

Hardware

Ryzen 5900X + Rx 5700XT + Rx 6800

Models

DeepSeek-V2-Lite-Chat.IQ4_NL.gguf

Problem description & steps to reproduce

The program always crashes after generating a single token. Prompt processing seem to work fine.

.\llama-cli.exe -m .\models\DeepSeek-V2-Lite-Chat.IQ4_NL.gguf -ngl 99 -t 12 -p "Hello"

First Bad Commit

most likely daa4228, I haven't done a proper bisect, but d6d2c2a was working.

Relevant log output

User: Hello

Assistant: HelloC:\Users\user\llama.cpp\ggml\src\ggml-vulkan\ggml-vulkan.cpp:ggml-vulkan.cpp:6078: GGML_ASSERT(ggml_vk_op_supports_incontiguous(op) || ggml_vk_dim01_contiguous(src0)) failed

The text was updated successfully, but these errors were encountered:

stduhpf · 2025-04-15T11:49:20Z

Also when using a batch size of 1, the crash happens during prompt processing.

stduhpf · 2025-04-15T12:15:59Z

Ok I found a fix. Now it's working, but I'm also noticing a small, but significant performance regression with prompt processing speed...
d6d2c2a:

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |    sm |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ----: | ------------: | -------------------: |
| deepseek2 16B IQ4_NL - 4.5 bpw |   8.36 GiB |    15.71 B | Vulkan,RPC |  99 |   row |        pp4096 |        328.76 ± 0.44 |
| deepseek2 16B IQ4_NL - 4.5 bpw |   8.36 GiB |    15.71 B | Vulkan,RPC |  99 |   row |        tg1024 |         91.81 ± 0.59 |

build: d6d2c2ab (5133)

With my fix on top of master: (7a8be3a):

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
| model                          |       size |     params | backend    | ngl |    sm |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ----: | ------------: | -------------------: |
| deepseek2 16B IQ4_NL - 4.5 bpw |   8.36 GiB |    15.71 B | Vulkan,RPC |  99 |   row |        pp4096 |        323.89 ± 0.93 |
| deepseek2 16B IQ4_NL - 4.5 bpw |   8.36 GiB |    15.71 B | Vulkan,RPC |  99 |   row |        tg1024 |         92.77 ± 0.19 |

build: 7a8be3ab (5140)

stduhpf added the bug-unconfirmed label Apr 15, 2025

stduhpf changed the title ~~Eval bug: Deepseek V2 no longer working with Vulkan (assert fail during tg)~~ Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg) Apr 15, 2025

stduhpf mentioned this issue Apr 15, 2025

Vulkan: Fix Deepseek V2 inference by making ggml_vk_op_supports_incontiguous(GGML_OP_RMS_NORM) return true #12960

Closed

jeffbolznv mentioned this issue Apr 20, 2025

vulkan: support noncontiguous rms_norm #13031

Merged

0cc4m closed this as completed in #13031 Apr 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg) #12956

Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg) #12956

stduhpf commented Apr 15, 2025

stduhpf commented Apr 15, 2025

stduhpf commented Apr 15, 2025 •

edited

Loading

Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg) #12956

Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg) #12956

Comments

stduhpf commented Apr 15, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

stduhpf commented Apr 15, 2025

stduhpf commented Apr 15, 2025 • edited Loading

stduhpf commented Apr 15, 2025 •

edited

Loading