-
Notifications
You must be signed in to change notification settings - Fork 278
Add Qwen Moe #2163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Qwen Moe #2163
Conversation
c76184e
to
d391cd2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just reviewed the MOE part of the code.
@@ -79,7 +79,7 @@ def build(self, decoder_sequence_shape): | |||
self.hidden_dim = decoder_sequence_shape[-1] | |||
|
|||
# Self attention layer. | |||
self._self_attention_layer = QwenAttention( | |||
self._self_attention_layer = QwenMoeAttention( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you have forked this file in qwen_moe folder - why is this being edited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this slipped in in find & replace, fixed it.
@divyashreepathihalli How should we accomodate aux_loss for CausalLM task here model here? We are specifying Sparse Categorical CrossEntropy Loss here: keras-hub/keras_hub/src/models/causal_lm.py Lines 109 to 119 in b997444
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @heyyanshuman!
I left some comments on the PR regarding tf ops
please add tests for the layers, backbones and tasks
I am curious to know if model.fit works, do you have a demo colab for inference and FT? - looking for the aux loss implementation
) | ||
self._query_dense.build(inputs_shape) | ||
|
||
self._key_dense = keras.layers.EinsumDense( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to rename this to to match other KH models here - value_dense and query_dense
this will allow enabling LoRA on this Model -
keras-hub/keras_hub/src/models/backbone.py
Line 195 in 7c86942
return ["query_dense", "value_dense", "query", "value"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have access to this document :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry! wrong copy pasta error! updated the link -
keras-hub/keras_hub/src/models/backbone.py
Line 195 in 7c86942
return ["query_dense", "value_dense", "query", "value"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @heyyanshuman - can you add a demo colab for inference and fit? and also aprovide colab/screenshot for numerics verification?
) | ||
self._query_dense.build(inputs_shape) | ||
|
||
self._key_dense = keras.layers.EinsumDense( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry! wrong copy pasta error! updated the link -
keras-hub/keras_hub/src/models/backbone.py
Line 195 in 7c86942
return ["query_dense", "value_dense", "query", "value"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more NIT comments!! Looking great overall!
Once we have the comments addressed and inference and fit demo - it is ready for merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Anshuman! left a few NIT comments. Can you please add a presets file as well?
|
||
def main(_): | ||
# === Get the preset name === | ||
# if FLAGS.preset not in PRESET_MAP.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uncomment the code
# ) | ||
# preset = FLAGS.preset | ||
# hf_preset = PRESET_MAP[preset] | ||
hf_preset = "Qwen/Qwen1.5-MoE-A2.7B" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us not hardcode this
top_k=self.top_k, | ||
attention_mask=attention_mask, | ||
) | ||
self.add_loss(self.router_aux_loss_coefficient * aux_loss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!!
Also I want to see Generate output matching |
This PR adds Qwen Mixture-of-expert model to Keras Hub.
Huggingface Reference : link
Qwen moe output matching: