-
Notifications
You must be signed in to change notification settings - Fork 44
Llama4 chunked attention support #395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
What is the plan to merge this code changes, As we have cut the branch for 1.20 we can now plan to merge Llama4 changes in main branch. |
f59f636
to
8a80c89
Compare
@@ -929,6 +948,8 @@ def get_specializations( | |||
"batch_size_times_num_tiles": batch_size_times_num_tiles, | |||
"img_size": img_size, | |||
"vision_size": vision_size, | |||
"chunk_length": prefill_seq_len, | |||
"chunk_ctx_len": chunk_ctx_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In specializations, we need total CL also, right? For nope layers KV.
_, chunk_causal_mask = self._update_causal_mask( | ||
attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions | ||
causal_mask = _create_causal_mask( | ||
position_ids=position_ids, target_length=past_key_values.key_cache[3].shape[-2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here key_cache[3]
instead of the hard coded value can we generalize using some config value.
@@ -259,6 +259,151 @@ def update3D( | |||
|
|||
return k_out, v_out | |||
|
|||
def _sliding_update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmigosw As we have discussed, please restructure this to reuse the hybrid cache function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please report the o/p match with reference, with a smaller config for chunked_window
. It will be hard verifying for 8K, so for testing purpose, change the config.
1066b4f
to
d8a947a
Compare
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: Rishin <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: vbaddi <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: vbaddi <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]>
Signed-off-by: Asmita Goswami <[email protected]>
Signed-off-by: Ann <[email protected]>
No description provided.