Skip to content

[BUG]: Error loading the LLava model #1136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aropb opened this issue Mar 22, 2025 · 22 comments
Open

[BUG]: Error loading the LLava model #1136

aropb opened this issue Mar 22, 2025 · 22 comments

Comments

@aropb
Copy link

aropb commented Mar 22, 2025

Models:
https://huggingface.co/benxh/Qwen2.5-VL-7B-Instruct-GGUF
https://huggingface.co/KBlueLeaf/llama3-llava-next-8b-gguf (from here #897)
https://huggingface.co/second-state/Llava-v1.5-7B-GGUF

Error:
External component has thrown an exception.
System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception.
at LLama.Native.SafeLlavaModelHandle.clip_model_load(String mmProj, Int32 verbosity)
at LLama.Native.SafeLlavaModelHandle.LoadFromFile(String modelPath, Int32 verbosity)
at LLama.LLavaWeights.LoadFromFile(String mmProject)

What could be the problem?
I wanted to use a multimodal model to convert image2text.

Environment & Configuration

  • Operating system: Windows
  • .NET runtime version: NET 9.0.3
  • LLamaSharp version: 0.23.0
  • CUDA version (if you are using cuda backend): CPU
  • CPU & GPU device: CPU
@martindevans
Copy link
Member

martindevans commented Mar 22, 2025

Does the model you're using work with the corresponding version of llama.cpp? Maybe also test in the previous version of LLamaSharp, to check for regressions.

@aropb
Copy link
Author

aropb commented Mar 22, 2025

I'll try. Can you tell me if llava should work with any multimodal models?

I also noticed that the context is created only through llama weight, why is that?

@aropb
Copy link
Author

aropb commented Mar 22, 2025

Does the model you're using work with the corresponding version of llama.cpp? Maybe also test in the previous version of LLamaSharp, to check for regressions.

0.21.0 - the same mistake

@aropb
Copy link
Author

aropb commented Mar 22, 2025

Does the model you're using work with the corresponding version of llama.cpp?

CPU

Yes, loaded!

Image

Image

@SignalRT
Copy link
Collaborator

I tested the master with the model used in the unit test successfully.

Tested on Windows with CUDA.

@aropb
Copy link
Author

aropb commented Mar 22, 2025

and on the CPU?

@aropb
Copy link
Author

aropb commented Mar 22, 2025

I tested the master with the model used in the unit test successfully.

Models/llava-v1.6-mistral-7b.Q3_K_XS.gguf ?

@SignalRT
Copy link
Collaborator

As I explain I tested in GPU. The unit test are running successfully so it should work. I'm talking about Llava. Qwen-VL is not tested, and should not work.

@aropb
Copy link
Author

aropb commented Mar 22, 2025

Qwen-VL?

Image

@aropb
Copy link
Author

aropb commented Mar 22, 2025

Does LLava not work on the CPU?

@SignalRT
Copy link
Collaborator

Qwen-VL?

Image

Is that LlamaSharp documentation?

@aropb
Copy link
Author

aropb commented Mar 22, 2025

mmproj-model-f16.gguf - works!
llava-v1.6-mistral-7b-Q5_K_S.gguf - dont works!
https://huggingface.co/bartowski/Qwen2-VL-7B-Instruct-GGUF - dont works!

@aropb
Copy link
Author

aropb commented Mar 22, 2025

Is that LlamaSharp documentation?

Yes
https://github.com/ggml-org/llama.cpp

@aropb
Copy link
Author

aropb commented Mar 22, 2025

Should the LLava model be mmproj only?
Is it really impossible to use even Qwen2-VL for image2text?

Maybe I don't understand correctly how to work with VLM.

@SignalRT
Copy link
Collaborator

mmproj-model-f16.gguf - works! llava-v1.6-mistral-7b-Q5_K_S.gguf - dont works! https://huggingface.co/bartowski/Qwen2-VL-7B-Instruct-GGUF - dont works!

You need both files. Check https://scisharp.github.io/LLamaSharp/0.23.0/QuickStart/ and the Llava example in the examples project

@SignalRT
Copy link
Collaborator

Is that LlamaSharp documentation?

Yes https://github.com/ggml-org/llama.cpp

That's llama.cpp documentation

Is that LlamaSharp documentation?

Yes https://github.com/ggml-org/llama.cpp

That's llama.cpp documention. In the list of supported models in LlamaSharp documentation https://github.com/SciSharp/LLamaSharp qwen-vl is not in the list.

If I find some time I will test it.

@aropb
Copy link
Author

aropb commented Mar 22, 2025

Thanks. Yes, I get it.

@aropb
Copy link
Author

aropb commented Mar 23, 2025

If I find some time I will test it.

I found model (main + mmproj):
https://huggingface.co/second-state/Qwen2-VL-7B-Instruct-GGUF

and here many mmproj models:
https://huggingface.co/koboldcpp/mmproj/tree/main

All models are loaded, but the output is not working. The models from the example work, but they are weak and old. I would like to try Qwen2-VL or gemma3 (it doesn't work yet, apparently a new version is needed llama.cpp, ggml-org/llama.cpp#12344).

@SignalRT
Copy link
Collaborator

@aropb,

After conducting several tests and reviewing the current status of llama.cpp in relation to multimodal models, my understanding is as follows:

  1. Qwen2-VL is supported but operates using its own CLI: qwen2vl-cli.cpp.
  2. Gemma3 is still experimental and also requires its own CLI: gemma3-cli.cpp. Additional details can be found here.

Regarding LlamaSharp, only Llava and similar models are currently compatible. I don’t believe that replicating the work done in qwen2vl-cli or gemma3-cli would be the best approach. Instead, I recommend waiting for llama.cpp to introduce a vision API and then updating LlamaSharp’s multimodal support to integrate with that API.

@aropb
Copy link
Author

aropb commented Mar 23, 2025

Thanks.
Very useful information. I've been trying these models for the second day, really nothing works except llava :) And which of llava is considered the most recent and powerful?

Apparently it is, but it is weak by modern standards for Vision:
https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-gguf

@aropb
Copy link
Author

aropb commented Mar 24, 2025

Instead, I recommend waiting for llama.cpp to introduce a vision API and then updating LlamaSharp’s multimodal support to integrate with that API.

I haven't found any information about this, can you show me where they discuss it?

I found:
ggml-org/llama.cpp#9687
ggml-org/llama.cpp#11292

@SignalRT
Copy link
Collaborator

SignalRT commented May 1, 2025

Instead, I recommend waiting for llama.cpp to introduce a vision API and then updating LlamaSharp’s multimodal support to integrate with that API.

I haven't found any information about this, can you show me where they discuss it?

I found: ggml-org/llama.cpp#9687 ggml-org/llama.cpp#11292

#1178 That's the information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants