Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 4: eager attention results in wrong casual mask shape #37322

Open
4 tasks
Qubitium opened this issue Apr 6, 2025 · 1 comment
Open
4 tasks

Llama 4: eager attention results in wrong casual mask shape #37322

Qubitium opened this issue Apr 6, 2025 · 1 comment
Labels

Comments

@Qubitium
Copy link
Contributor

Qubitium commented Apr 6, 2025

System Info

Latest transformers 4.51.0

Who can help?

@ArthurZucker @SunMarc @MekkCyber I know you guys are probably burning an all-nighter due to Llama 4. =)

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

While adding GPTQModel support Llama 4 which does transformers inference at the layer/module level, we found issues where when eager attention is used, casual mask shapes are off by exactly factor of 2 or 4 for Llama 4:

ModelCloud/GPTQModel#1508

Image

Image

We are able to bypass this for now by setting batch=1, removing padding, and not passing attention_mask for inference.

Expected behavior

No error.

@Qubitium Qubitium added the bug label Apr 6, 2025
@ArthurZucker
Copy link
Collaborator

Thanks will have a look asap!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants