Skip to content

The output logistics of Qwen-7B under 8-bit quantization contain NaN #1504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Kairong-Han opened this issue Feb 9, 2025 · 2 comments
Open

Comments

@Kairong-Han
Copy link

My version is as follows:

torch: 2.4.1
torchaudio: 2.4.1
torchvision: 0.19.1
cuda: 12.4
bitsandbytes: 0.42.0

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

When I use qwen-7b, such as

···
device ='cuda:0'
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True)
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM, inference_mode=False, r=32, lora_alpha=32, lora_dropout=0.1,target_modules=["gate_proj","up_proj","down_proj","q_proj","k_proj","v_proj"]
)
model = get_peft_model(model, peft_config)

model.to(device='cuda:0')

prompt=tokenizer.encode(‘1+1=’, return_tensors="pt",padding='max_length',max_length=100,add_special_tokens=True).to('cuda:0')
labels = prompt
outputs = model(prompt.to(device), labels=labels.to(device),output_attentions=True)
···

The output of qwen outputs.logits[0,:10,:10]:

···
tensor([[ nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan],
[ 7.0352, -1.2031, 1.0703, 0.5190, 2.5098, 6.9844, 0.9600, 2.6797,
2.8594, 6.7031],
[ nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan],
[ nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan],
[ nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan],
[ nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan],
[ nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan],
[ nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan],
[ 3.9023, 5.9258, 11.9141, 4.4336, 4.9883, 2.0195, 3.9336, 5.5039,
-0.5679, 4.5508],
[ 3.8711, 4.6523, 12.2188, 3.4648, 5.2227, 1.4297, 3.0352, 4.8828,
-1.4443, 4.4258]], device='cuda:0', dtype=torch.float16,
grad_fn=)
····

How can I solve this problem?

@matthewdouglas
Copy link
Member

Please upgrade to the latest bitsandbytes. Additionally, you may wish to try with torch_dtype=torch.bfloat16.

@TimDettmers
Copy link
Collaborator

What is likely happening is that the quantization leads to poor quality somewhere along the model and then it turns into nan values. However, I use Qwen 2.5 7B regularly and do not see this problem. So something else might be wrong. We would appreciate more information if upgrading bitsandbytes did not yield any solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants