Llama 4: eager attention results in wrong casual mask shape #37322

Qubitium · 2025-04-06T08:56:04Z

System Info

Latest transformers 4.51.0

Who can help?

@ArthurZucker @SunMarc @MekkCyber I know you guys are probably burning an all-nighter due to Llama 4. =)

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

While adding GPTQModel support Llama 4 which does transformers inference at the layer/module level, we found issues where when eager attention is used, casual mask shapes are off by exactly factor of 2 or 4 for Llama 4:

ModelCloud/GPTQModel#1508

We are able to bypass this for now by setting batch=1, removing padding, and not passing attention_mask for inference.

Expected behavior

No error.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2025-04-06T09:10:53Z

Thanks will have a look asap!

Qubitium added the bug label Apr 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 4: eager attention results in wrong casual mask shape #37322

Llama 4: eager attention results in wrong casual mask shape #37322

Qubitium commented Apr 6, 2025 •

edited

Loading

ArthurZucker commented Apr 6, 2025

Llama 4: eager attention results in wrong casual mask shape #37322

Llama 4: eager attention results in wrong casual mask shape #37322

Comments

Qubitium commented Apr 6, 2025 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Apr 6, 2025

Qubitium commented Apr 6, 2025 •

edited

Loading