Add Qwen3 #1835

bzantium · 2025-06-14T07:44:10Z

Description

Add Qwen3 models.
Basically, the implementation is highly similar to llama2 except qk_norm so I just added use_qk_norm. Also, I added query_pre_attn_scalar which is head_dim ** 0.5 as llama4.

maxtext/MaxText/layers/qwen3.py

Lines 106 to 107 in f4a5e24

    
           use_qk_norm=cfg.use_qk_norm, 
        
           query_pre_attn_scalar=query_pre_attn_scalar,

FIXES: #1834

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

bzantium added 2 commits June 14, 2025 16:40

Add Qwen3

7892744

make style

f4a5e24

bzantium requested review from gobbleturk, khatwanimohit, bvandermoon, vipannalla, RissyRan, richjames0, gagika, shralex, yangyuwei, SurbhiJainUSC, hengtaoguo, A9isha and aireenmei as code owners June 14, 2025 07:44

set use_qk_norm true

d325e91

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Qwen3 #1835

Add Qwen3 #1835

Uh oh!

bzantium commented Jun 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

	use_qk_norm=cfg.use_qk_norm,
	query_pre_attn_scalar=query_pre_attn_scalar,

Add Qwen3 #1835

Are you sure you want to change the base?

Add Qwen3 #1835

Uh oh!

Conversation

bzantium commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

Uh oh!

bzantium commented Jun 14, 2025 •

edited

Loading