Skip to content

[Core] feat: Implement Priority Scheduling in V1 Engine #18700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 256 commits into from

Conversation

amitm02
Copy link
Contributor

@amitm02 amitm02 commented May 26, 2025

This commit introduces priority scheduling capabilities to the V1 LLM engine.

Key changes include:

  1. EngineCoreRequest and Request updates:

    • Added a priority field to EngineCoreRequest and Request classes to carry priority information.
  2. Processor update:

    • Modified Processor.process_inputs to accept and pass the priority to EngineCoreRequest.
  3. V1 Scheduler modifications:

    • The scheduler now respects the --scheduling-policy argument.
    • When policy="priority", self.waiting is managed as a min-heap, prioritizing requests by their assigned priority value (lower value means higher priority) and then by arrival time (FCFS).
    • Preemption logic now correctly identifies and preempts the actual lowest-priority running request when space is needed for higher-priority or new requests.
    • FCFS behavior is maintained when policy="fcfs".
  4. Documentation:

    • Updated docs/usage/v1_guide.md and docs/serving/openai_compatible_server.md to reflect V1 engine's support for priority scheduling.
  5. Unit Tests:

    • Added a new test suite in tests/v1/core/test_scheduler.py.

This allows you to influence the order of request processing in the V1 engine by assigning priorities, which is particularly useful in scenarios with varying request importance.

FIX #14002

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation v1 labels May 26, 2025
@amitm02 amitm02 force-pushed the feat/v1-priority-scheduling branch from dbdfa5b to 8b54316 Compare May 27, 2025 12:08
@mergify mergify bot added ci/build frontend multi-modality Related to multi-modality (#4194) structured-output tpu Related to Google TPUs labels May 27, 2025
@mergify mergify bot added the tool-calling label May 27, 2025
@mergify mergify bot removed the tpu Related to Google TPUs label May 27, 2025
njhill and others added 14 commits June 1, 2025 17:57
Signed-off-by: reidliu41 <[email protected]>
Co-authored-by: reidliu41 <[email protected]>
Signed-off-by: amit <[email protected]>
…M with arbitrary components (vllm-project#18987)

Signed-off-by: isotr0py <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: amit <[email protected]>
Signed-off-by: reidliu41 <[email protected]>
Co-authored-by: reidliu41 <[email protected]>
Signed-off-by: amit <[email protected]>
Signed-off-by: reidliu41 <[email protected]>
Co-authored-by: reidliu41 <[email protected]>
Signed-off-by: amit <[email protected]>
Signed-off-by: amit <[email protected]>
Signed-off-by: amit <[email protected]>
@mergify mergify bot added speculative-decoding tpu Related to Google TPUs labels Jun 1, 2025
Copy link

mergify bot commented Jun 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @amitm02.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jun 1, 2025
@mergify mergify bot removed tpu Related to Google TPUs needs-rebase labels Jun 1, 2025
amitm02 added 3 commits June 1, 2025 18:08
Signed-off-by: amit <[email protected]>
Signed-off-by: amit <[email protected]>
Signed-off-by: amit <[email protected]>
@youkaichao
Copy link
Member

the commit history is in a mess, can you clean it up? maybe open another PR?

@amitm02
Copy link
Contributor Author

amitm02 commented Jun 3, 2025

the commit history is in a mess, can you clean it up? maybe open another PR?

Re-submitted as #19057

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

[Feature]: Implement Priority Scheduling In V1 Engine