-
Notifications
You must be signed in to change notification settings - Fork 28.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading HQQ quantized models is broken since #35926 #37263
Comments
The fix is here #37347, it was indeed introduced when adding Deepseek! |
Thank you @Cyrilvallez ! Seems to work! import torch
from transformers import Gemma3ForCausalLM, AutoProcessor
model = Gemma3ForCausalLM.from_pretrained(
'mobiuslabsgmbh/gemma-3-12b-it_4bitgs64_bfp16_hqq_hf',
torch_dtype=torch.bfloat16,
attn_implementation="sdpa",
device_map="cuda",
) |
It's because it's not a |
Sorry, but there's still a problem loading hqq quantized model. I noticed that the ones that have a bias no longer load: https://gist.github.com/mobicham/701dd564c52590203ee09631425ad797 It is related to this old commit: 4b5cf54 The test would have failed if the test file was using a model that has a bias like |
System Info
transformers
version: 4.51.0.dev0Loading HQQ models is broken since #35926
Not sure what changed, probably something in
modeling_utils
@SunMarc @ArthurZucker
Reproduction
Expected behavior
HQQ quantized models were loading fine before #35926
The text was updated successfully, but these errors were encountered: