-
Notifications
You must be signed in to change notification settings - Fork 28.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flex_attention support for Qwen2.5/Gemma is broken #37299
Comments
I haven't used Gemma yet but noticed that its implementation also omits handling the flex_attention case in the Please correct me if I'm wrong, but this appears to be a wider bug in the model code generation logic that misses handling flex_attention when the model has |
Yes, this is required indeed ! Do you want to open a PR for a fix? |
Yes I'd like to! Will do once I'm sure what the fix is. I'm new to internals of Now Potential fix ideas:
Technically 2 alone would solve the Qwen2 issue too, but 1 would still simplify things. Similarly, the Gemma3 text model inheritance goes llke:
It comes down to keeping all masking related logic contained in Edit: the above fixes would require some additions to PS: why does |
just update the overwritter |
@Cyrilvallez is reworking our attention creation api |
System Info
transformers
version: 4.50.3Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
This snippet should run without error, like it does for
meta-llama/Llama-3.2-1B
, since Qwen2.5 model is based on Llama arch and both support flex_attention (_supports_flex_attention=True
).The error occurs because
Qwen2Model._update_causal_mask()
doesn't handle the case when flex_attention is enabled and the block mask is passed in as theattention_mask
. This is handled inLlamaModel._update_causal_mask()
:IIUC, adding the same handling to Qwen2Model should fix the issue, and indeed this works on my local fork. But Qwen2Model code is auto-generated, so it must be fixed elsewhere.
The text was updated successfully, but these errors were encountered: