You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
While adding GPTQModel support Llama 4 which does transformers inference at the layer/module level, we found issues where when eager attention is used, casual mask shapes are off by exactly factor of 2 or 4 for Llama 4:
System Info
Latest transformers 4.51.0
Who can help?
@ArthurZucker @SunMarc @MekkCyber I know you guys are probably burning an all-nighter due to Llama 4. =)
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
While adding GPTQModel support Llama 4 which does transformers inference at the layer/module level, we found issues where when eager attention is used, casual mask shapes are off by exactly factor of 2 or 4 for Llama 4:
ModelCloud/GPTQModel#1508
We are able to bypass this for now by setting
batch=1
, removing padding, and not passingattention_mask
for inference.Expected behavior
No error.
The text was updated successfully, but these errors were encountered: