-
Notifications
You must be signed in to change notification settings - Fork 45
Llama4 chunked attention support #395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: add_llama4
Are you sure you want to change the base?
Conversation
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: vbaddi <[email protected]>
…ample Signed-off-by: vbaddi <[email protected]>
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: vbaddi <[email protected]>
…mple files of llama4 as its now same as other Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
What is the plan to merge this code changes, As we have cut the branch for 1.20 we can now plan to merge Llama4 changes in main branch. |
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
e5e2218
to
8bcbdc0
Compare
Signed-off-by: Rishin <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: vbaddi <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: vbaddi <[email protected]> Signed-off-by: Rishin <[email protected]>
Signed-off-by: Rishin <[email protected]>
@@ -929,6 +948,8 @@ def get_specializations( | |||
"batch_size_times_num_tiles": batch_size_times_num_tiles, | |||
"img_size": img_size, | |||
"vision_size": vision_size, | |||
"chunk_length": prefill_seq_len, | |||
"chunk_ctx_len": chunk_ctx_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In specializations, we need total CL also, right? For nope layers KV.
Signed-off-by: Rishin <[email protected]>
_, chunk_causal_mask = self._update_causal_mask( | ||
attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions | ||
causal_mask = _create_causal_mask( | ||
position_ids=position_ids, target_length=past_key_values.key_cache[3].shape[-2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here key_cache[3]
instead of the hard coded value can we generalize using some config value.
Signed-off-by: Asmita Goswami <[email protected]>
Added Hybrid Chunked Cache for Llama4
@@ -259,6 +259,151 @@ def update3D( | |||
|
|||
return k_out, v_out | |||
|
|||
def _sliding_update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmigosw As we have discussed, please restructure this to reuse the hybrid cache function.
1066b4f
to
d8a947a
Compare
No description provided.