Skip to content

Add Llama4 Multi-Modal Support #382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Add Llama4 Multi-Modal Support #382

wants to merge 26 commits into from

Conversation

vbaddi
Copy link
Contributor

@vbaddi vbaddi commented Apr 29, 2025

No description provided.

@vbaddi vbaddi requested a review from mohiso22 April 29, 2025 10:41
@quic-rishinr quic-rishinr marked this pull request as draft April 30, 2025 05:37
def __qeff_init__(self):
self.rotary_embedding = QEffLlama4VisionRotaryEmbedding(config=self.config)

def forward(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this forward? remove

17, # constants.INTERN_NUM_PATCHES,
14, # constants.INTERN_FEATURE_SIZE,
inputs_shapes["vision_embeds"] = (
1, # constants.INTERN_NUM_PATCHES,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to constants? or compute based on the code.

@quic-amitraj quic-amitraj force-pushed the add_llama4 branch 2 times, most recently from 75242b2 to 1b3dd96 Compare May 8, 2025 10:05
pyproject.toml Outdated
@@ -19,8 +19,8 @@ classifiers = [
]
requires-python = ">=3.8,<3.11"
dependencies = [
"transformers==4.50.0",
"huggingface-hub==0.27.0",
"transformers==4.51.0",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have moved transformers to 4.51.3 in vLLM

@@ -46,6 +46,7 @@ class QEFFBaseModel(ABC):
def _transform_names(cls) -> List[str]:
return [x.__name__ for x in cls._pytorch_transforms + cls._onnx_transforms]

@append_tranform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a decorator to the constructor define a method for applying pytorch transforms, pass a copy of cls._pytorch_transforms. You can add the decorator for that method.

real, imag = torch.cos(freqs), torch.sin(freqs)
freqs_cis = torch.stack([real, imag], dim=-1)

# freqs_cis = torch.view_as_complex(freqs_cis.contiguous())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented lines

final = shared_out + expert_out # restore [B,S,H]
return final.view(B, S, H), router_logits

# ------------------- Gather based, weights as activation approach ---------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ig no, will remove the other commented approaches.

@@ -1,13 +1,14 @@
# -----------------------------------------------------------------------------
#
# Copyright (c) 2025 Qualcomm Innovation Center, Inc. All rights reserved.
# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the year

@@ -49,29 +50,28 @@
GraniteModel,
GraniteRMSNorm,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why GraniteMoe is removed?

QEffGraniteMoeParallelExperts,
QEffGraniteMoeRotaryEmbedding,
QEffGraniteMoeTopKGating,
from QEfficient.transformers.models.internvl.modeling_internvl import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@@ -401,6 +404,8 @@ class KVCacheTransform(ModuleMappingTransform):
@classmethod
def apply(cls, model: nn.Module) -> Tuple[nn.Module, bool]:
model, transformed = super().apply(model)
# FIXME: see if we can merge into _module_mapping dict
transformers.cache_utils.DynamicCache.update = QEffDynamicCache.update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need it here? since we will be going with HybridChunkCache?

@quic-rishinr
Copy link
Contributor

@mohiso22 please rebase the PR. add test and add the model to validated model list

@mohiso22 mohiso22 force-pushed the add_llama4 branch 2 times, most recently from 5d3870d to 77afbbc Compare June 9, 2025 10:22

lang_inputs = {k: v for k, v in inputs.items() if k not in vision_inputs}
lang_inputs["position_ids"] = np.where(
lang_inputs.pop("attention_mask"), np.arange(padded_len), -1
) # Need to use -1 as position_ids for invalid tokens

NOT_MLLAMA = hasattr(self.model.config, "model_type") and self.model.config.model_type != "mllama"
if NOT_MLLAMA:
lang_inputs["index"] = np.array([[0]])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohiso22 can we rename the index to something which is more readable? image_idx?

@vbaddi vbaddi marked this pull request as ready for review June 10, 2025 06:16
self.language_model = self.model.language_model
self.config = self.model.config

def forward(self, input_ids, vision_embeds, position_ids, index, past_key_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you add a detailed the doc string with the interleaving logic we are using here.
  2. Please renaming the index variable to something more descriptive, such as vision_start_index the current name index is too generic.

def forward(self, input_ids, vision_embeds, position_ids, index, past_key_values):
inputs_embeds = self.model.language_model.get_input_embeddings()(input_ids)
selected = input_ids == self.model.config.image_token_index
indices1 = selected.to(torch.int64).cumsum(1) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of indices0 and indices1 please use more descriptive variable names

def get_qeff_language_decoder(self):
return QEffLlama4DecoderWrapper(self)

def forward(self, input_ids, position_ids, pixel_values, index, past_key_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use more descriptive variable name for index

vision_flat = image_features.view(-1, image_features.size(-1))
projected_vision_flat = self.multi_modal_projector(vision_flat)
selected = input_ids == self.config.image_token_index
indices1 = selected.to(torch.int64).cumsum(1) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above


lang_inputs = {k: v for k, v in inputs.items() if k not in vision_inputs}
lang_inputs["position_ids"] = np.where(
lang_inputs.pop("attention_mask"), np.arange(padded_len), -1
) # Need to use -1 as position_ids for invalid tokens

NOT_MLLAMA = hasattr(self.model.config, "model_type") and self.model.config.model_type != "mllama"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not a constant why we are using uppercase here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will change to smallercase

@@ -0,0 +1,50 @@
# -----------------------------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file. Lets maintain only one example different inputs, Text only, text + image, text + multiple images

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure

@quic-amitraj quic-amitraj force-pushed the add_llama4 branch 2 times, most recently from ee23c3a to 1066b4f Compare June 10, 2025 15:09
vbaddi and others added 7 commits June 10, 2025 15:10
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
…ample

Signed-off-by: vbaddi <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
vbaddi and others added 19 commits June 10, 2025 15:10
Signed-off-by: vbaddi <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
…mple files of llama4 as its now same as other

Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
…_rope

Signed-off-by: vbaddi <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Mohit Soni <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Rishin Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Rishin Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants