Skip to content

Add Qwen Moe #2163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open

Conversation

kanpuriyanawab
Copy link
Collaborator

@kanpuriyanawab kanpuriyanawab commented Mar 23, 2025

This PR adds Qwen Mixture-of-expert model to Keras Hub.

Huggingface Reference : link

Qwen moe output matching:

Screenshot 2025-04-20 at 2 18 58 PM

@kanpuriyanawab kanpuriyanawab self-assigned this Mar 29, 2025
@kanpuriyanawab kanpuriyanawab marked this pull request as ready for review March 29, 2025 05:06
@mattdangerw mattdangerw removed the request for review from divyashreepathihalli March 31, 2025 16:41
@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Mar 31, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 31, 2025
Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reviewed the MOE part of the code.

@@ -79,7 +79,7 @@ def build(self, decoder_sequence_shape):
self.hidden_dim = decoder_sequence_shape[-1]

# Self attention layer.
self._self_attention_layer = QwenAttention(
self._self_attention_layer = QwenMoeAttention(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have forked this file in qwen_moe folder - why is this being edited?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this slipped in in find & replace, fixed it.

@kanpuriyanawab
Copy link
Collaborator Author

@divyashreepathihalli How should we accomodate aux_loss for CausalLM task here model here?

We are specifying Sparse Categorical CrossEntropy Loss here:

if optimizer == "auto":
optimizer = keras.optimizers.Adam(2e-5)
if loss == "auto":
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
if weighted_metrics == "auto":
weighted_metrics = [keras.metrics.SparseCategoricalAccuracy()]
super().compile(
optimizer=optimizer,
loss=loss,
weighted_metrics=weighted_metrics,
**kwargs,

Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @heyyanshuman!
I left some comments on the PR regarding tf ops
please add tests for the layers, backbones and tasks
I am curious to know if model.fit works, do you have a demo colab for inference and FT? - looking for the aux loss implementation

)
self._query_dense.build(inputs_shape)

self._key_dense = keras.layers.EinsumDense(
Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to rename this to to match other KH models here - value_dense and query_dense
this will allow enabling LoRA on this Model -

return ["query_dense", "value_dense", "query", "value"]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have access to this document :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry! wrong copy pasta error! updated the link -

return ["query_dense", "value_dense", "query", "value"]

Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @heyyanshuman - can you add a demo colab for inference and fit? and also aprovide colab/screenshot for numerics verification?

)
self._query_dense.build(inputs_shape)

self._key_dense = keras.layers.EinsumDense(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry! wrong copy pasta error! updated the link -

return ["query_dense", "value_dense", "query", "value"]

@kanpuriyanawab
Copy link
Collaborator Author

Output matching ss:

image

@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Apr 15, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 15, 2025
Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more NIT comments!! Looking great overall!
Once we have the comments addressed and inference and fit demo - it is ready for merge.

@kanpuriyanawab
Copy link
Collaborator Author

Qwen moe output matching:

Screenshot 2025-04-20 at 2 18 58 PM

Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Anshuman! left a few NIT comments. Can you please add a presets file as well?


def main(_):
# === Get the preset name ===
# if FLAGS.preset not in PRESET_MAP.keys():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncomment the code

# )
# preset = FLAGS.preset
# hf_preset = PRESET_MAP[preset]
hf_preset = "Qwen/Qwen1.5-MoE-A2.7B"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us not hardcode this

top_k=self.top_k,
attention_mask=attention_mask,
)
self.add_loss(self.router_aux_loss_coefficient * aux_loss)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!!

@divyashreepathihalli
Copy link
Collaborator

Also I want to see Generate output matching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants