Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama4TextExperts module implementation #37325

Open
Godofnothing opened this issue Apr 6, 2025 · 1 comment
Open

Llama4TextExperts module implementation #37325

Godofnothing opened this issue Apr 6, 2025 · 1 comment
Labels
bug Usage General questions about the library

Comments

@Godofnothing
Copy link

Godofnothing commented Apr 6, 2025

System Info

Llama4 model family adopts MoE layer implementation for better efficiency.

However, in the current implementation MoE layer in fact performs an ordinary dense FFN forward pass with all experts being involved in the computation. One can see, that gate_up_proj matrix has the same shape as if all num_experts are active.

Image

I guess the intent was to perform computation only for the experts selected by router.

Who can help?

@ArthurZucker

Reproduction

Any usage of the model

Expected behavior

Only experts chosen by the router are involved in computation

@ArthurZucker ArthurZucker added the Usage General questions about the library label Apr 7, 2025
@ArthurZucker
Copy link
Collaborator

Hey! there are different formulations to MoE, we went with the one that requires a bit more memory but is more time efficient: we use tensor parallel to alleviate some of the problems, and we expand the inputs so that all expert view all inputs, its not highly optimized but we'll add a better implementation soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Usage General questions about the library
Projects
None yet
Development

No branches or pull requests

2 participants