Add Qwen Moe #2163

kanpuriyanawab · 2025-03-23T10:30:00Z

This PR adds Qwen Mixture-of-expert model to Keras Hub.

Huggingface Reference : link

Qwen moe output matching:

divyashreepathihalli

just reviewed the MOE part of the code.

divyashreepathihalli · 2025-03-31T18:41:52Z

keras_hub/src/models/qwen/qwen_decoder.py

@@ -79,7 +79,7 @@ def build(self, decoder_sequence_shape):
        self.hidden_dim = decoder_sequence_shape[-1]

        # Self attention layer.
-        self._self_attention_layer = QwenAttention(
+        self._self_attention_layer = QwenMoeAttention(


I see you have forked this file in qwen_moe folder - why is this being edited?

looks like this slipped in in find & replace, fixed it.

keras_hub/src/models/qwen_moe/qwen_moe_decoder.py

kanpuriyanawab · 2025-04-02T11:10:26Z

@divyashreepathihalli How should we accomodate aux_loss for CausalLM task here model here?

We are specifying Sparse Categorical CrossEntropy Loss here:

keras-hub/keras_hub/src/models/causal_lm.py

Lines 109 to 119 in b997444

    
           if optimizer == "auto": 
        
               optimizer = keras.optimizers.Adam(2e-5) 
        
           if loss == "auto": 
        
               loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True) 
        
           if weighted_metrics == "auto": 
        
               weighted_metrics = [keras.metrics.SparseCategoricalAccuracy()] 
        
           super().compile( 
        
               optimizer=optimizer, 
        
               loss=loss, 
        
               weighted_metrics=weighted_metrics, 
        
               **kwargs,

divyashreepathihalli

Thanks for the updates @heyyanshuman!
I left some comments on the PR regarding tf ops
please add tests for the layers, backbones and tasks
I am curious to know if model.fit works, do you have a demo colab for inference and FT? - looking for the aux loss implementation

divyashreepathihalli · 2025-04-10T01:07:18Z

keras_hub/src/models/qwen_moe/qwen_moe_attention.py

+        )
+        self._query_dense.build(inputs_shape)
+
+        self._key_dense = keras.layers.EinsumDense(


you might want to rename this to to match other KH models here - value_dense and query_dense
this will allow enabling LoRA on this Model -

keras-hub/keras_hub/src/models/backbone.py

Line 195 in 7c86942

return ["query_dense", "value_dense", "query", "value"]

I don't have access to this document :(

sorry! wrong copy pasta error! updated the link -

keras-hub/keras_hub/src/models/backbone.py

Line 195 in 7c86942

return ["query_dense", "value_dense", "query", "value"]

keras_hub/src/models/qwen_moe/qwen_moe_attention.py

keras_hub/src/models/qwen_moe/qwen_moe_causal_lm.py

keras_hub/src/models/qwen_moe/qwen_moe_attention.py

keras_hub/src/models/qwen_moe/qwen_moe_causal_lm.py

divyashreepathihalli

Thanks @heyyanshuman - can you add a demo colab for inference and fit? and also aprovide colab/screenshot for numerics verification?

divyashreepathihalli · 2025-04-14T15:21:10Z

keras_hub/src/models/qwen_moe/qwen_moe_attention.py

+        )
+        self._query_dense.build(inputs_shape)
+
+        self._key_dense = keras.layers.EinsumDense(


sorry! wrong copy pasta error! updated the link -

keras-hub/keras_hub/src/models/backbone.py

Line 195 in 7c86942

return ["query_dense", "value_dense", "query", "value"]

keras_hub/src/models/qwen_moe/qwen_moe_attention.py

kanpuriyanawab · 2025-04-14T15:49:46Z

Output matching ss:

keras_hub/src/models/qwen_moe/qwen_moe_decoder.py

keras_hub/src/models/qwen_moe/qwen_moe_backbone_test.py

keras_hub/src/models/qwen_moe/qwen_moe_decoder.py

divyashreepathihalli

Left a few more NIT comments!! Looking great overall!
Once we have the comments addressed and inference and fit demo - it is ready for merge.

kanpuriyanawab · 2025-04-28T05:41:38Z

Qwen moe output matching:

divyashreepathihalli

Thanks Anshuman! left a few NIT comments. Can you please add a presets file as well?

divyashreepathihalli · 2025-05-02T19:25:52Z

tools/checkpoint_conversion/convert_qwen_moe_checkpoints.py

+
+def main(_):
+    # === Get the preset name ===
+    # if FLAGS.preset not in PRESET_MAP.keys():


uncomment the code

divyashreepathihalli · 2025-05-02T19:26:02Z

tools/checkpoint_conversion/convert_qwen_moe_checkpoints.py

+    #     )
+    # preset = FLAGS.preset
+    # hf_preset = PRESET_MAP[preset]
+    hf_preset = "Qwen/Qwen1.5-MoE-A2.7B"


let us not hardcode this

divyashreepathihalli · 2025-05-02T19:29:00Z

keras_hub/src/models/qwen_moe/qwen_moe_decoder.py

+                top_k=self.top_k,
+                attention_mask=attention_mask,
+            )
+            self.add_loss(self.router_aux_loss_coefficient * aux_loss)


divyashreepathihalli · 2025-05-02T19:31:06Z

Also I want to see Generate output matching

kanpuriyanawab and others added 3 commits March 23, 2025 15:59

qwen moe init commit

1256614

wip

4e1d714

Merge branch 'keras-team:master' into qwen-moe

df0c409

kanpuriyanawab self-assigned this Mar 29, 2025

kanpuriyanawab requested review from mattdangerw, abheesht17 and divyashreepathihalli March 29, 2025 05:06

kanpuriyanawab marked this pull request as ready for review March 29, 2025 05:06

kanpuriyanawab added 3 commits March 29, 2025 13:34

wip

20be536

weight conversion wip

6986253

weight matching complete

d391cd2

kanpuriyanawab force-pushed the qwen-moe branch from c76184e to d391cd2 Compare March 29, 2025 08:04

update the docstrings + configs

1d1f18d

mattdangerw removed the request for review from divyashreepathihalli March 31, 2025 16:41

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Mar 31, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 31, 2025

divyashreepathihalli reviewed Mar 31, 2025

View reviewed changes

remove incorrect import

a597b82

kanpuriyanawab and others added 6 commits April 8, 2025 17:16

wip

bd17346

updates

abc2ad2

updates

c423410

updates

8d3d89a

Merge branch 'keras-team:master' into qwen-moe

eea62f0

bug fix

e85c404

divyashreepathihalli reviewed Apr 10, 2025

View reviewed changes

kanpuriyanawab added 4 commits April 10, 2025 14:10

add aux loss

a3fc50d

address comments

5e175b2

causal lm test

d87601e

add tests

68396cf

divyashreepathihalli reviewed Apr 14, 2025

View reviewed changes

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Apr 15, 2025

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Apr 15, 2025