Skip to content

[Feature]: Support Pipeline Parallelism on Llama-4-Maverick-17B-128E #16231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
Edwinhr716 opened this issue Apr 8, 2025 · 5 comments
Closed
1 task done
Labels
feature request New feature or request

Comments

@Edwinhr716
Copy link
Contributor

Edwinhr716 commented Apr 8, 2025

🚀 The feature, motivation and pitch

I'm attempting to deploy Llama-4-Maverick-17B-128E across 16 H100s on two nodes, running this command:

python3 -m vllm.entrypoints.openai.api_server --port 8080 --model meta-llama/Llama-4-Maverick-17B-128E-Instruct --tensor-parallel-size 8 --pipeline-parallel-size 2

I got this message saying that PP isn't supported

NotImplementedError: Pipeline parallelism is not supported for this model. Supported models implement the `SupportsPP` interface.

Llama-4-Maverick-17B-128E is a large LLM that most people will be running across multiple GPU nodes.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@Edwinhr716 Edwinhr716 added the feature request New feature or request label Apr 8, 2025
@houseroad houseroad assigned houseroad and unassigned houseroad Apr 8, 2025
@houseroad
Copy link
Collaborator

houseroad commented Apr 8, 2025

I think trunk should already have the PP support. @zhewenl, could you help on verification?

@zhewenl
Copy link
Collaborator

zhewenl commented Apr 8, 2025

@Edwinhr716 rebase to latest main and try again, it should be supported: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/mllama4.py#L668-L669
(unable to reproduce your issue: https://gist.github.com/zhewenl/c2a946bbc0c24450bd469aa29f836784)

@Edwinhr716
Copy link
Contributor Author

sweet, so should be available on 0.8.4 release?

@houseroad
Copy link
Collaborator

It will. Before it's ready, feel free to try our nightly: https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#pre-built-wheels

@houseroad
Copy link
Collaborator

Okay, i will close this issue, feel free to re-open if anything else is needed :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
Development

No branches or pull requests

3 participants