-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Feature]: Support Pipeline Parallelism on Llama-4-Maverick-17B-128E #16231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think trunk should already have the PP support. @zhewenl, could you help on verification? |
@Edwinhr716 rebase to latest main and try again, it should be supported: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/mllama4.py#L668-L669 |
sweet, so should be available on 0.8.4 release? |
It will. Before it's ready, feel free to try our nightly: https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#pre-built-wheels |
Okay, i will close this issue, feel free to re-open if anything else is needed :-) |
Uh oh!
There was an error while loading. Please reload this page.
🚀 The feature, motivation and pitch
I'm attempting to deploy Llama-4-Maverick-17B-128E across 16 H100s on two nodes, running this command:
I got this message saying that PP isn't supported
Llama-4-Maverick-17B-128E is a large LLM that most people will be running across multiple GPU nodes.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: