[Core] feat: Implement Priority Scheduling in V1 Engine #19057

amitm02 · 2025-06-03T06:34:57Z

This commit introduces priority scheduling capabilities to the V1 LLM engine.

Key changes include:

EngineCoreRequest and Request updates:

Added a priority field to EngineCoreRequest and Request classes to carry priority information.
Processor update:

Modified Processor.process_inputs to accept and pass the priority to EngineCoreRequest.
V1 Scheduler modifications:

The scheduler now respects the --scheduling-policy argument.
When policy="priority", self.waiting is managed as a min-heap, prioritizing requests by their assigned priority value (lower value means higher priority) and then by arrival time (FCFS).
Preemption logic now correctly identifies and preempts the actual lowest-priority running request when space is needed for higher-priority or new requests.
FCFS behavior is maintained when policy="fcfs".
Documentation:

Updated docs/usage/v1_guide.md and docs/serving/openai_compatible_server.md to reflect V1 engine's support for priority scheduling.
Unit Tests:

Added a new test suite in tests/v1/core/test_scheduler.py.
This allows you to influence the order of request processing in the V1 engine by assigning priorities, which is particularly useful in scenarios with varying request importance.

FIX #14002

Signed-off-by: amit <[email protected]>

github-actions · 2025-06-03T06:35:43Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

docs/usage/v1_guide.md

Signed-off-by: amit <[email protected]>

mergify · 2025-06-04T22:00:19Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @amitm02.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

aarnphm

Just a quick few notes about formatting change to help with reviewing the core logic of the scheduler.

For the types of the waiting queue, I think we can implement a dequeue subclass that contains priority ranking per request

vllm/v1/core/sched/scheduler.py

Signed-off-by: amit <[email protected]>

…tive value Signed-off-by: amit <[email protected]>

Signed-off-by: amit <[email protected]>

…tion Signed-off-by: amit <[email protected]>

aarnphm

I think the code looks a lot cleaner. Thanks for this.

But I'm not sure about the degradation in FCFS. 5% is quite a huge margin...

vllm/v1/core/sched/request_queue.py

vllm/v1/core/sched/scheduler.py

vllm/v1/core/sched/request_queue.py

Signed-off-by: amit <[email protected]>

amitm02 · 2025-06-20T09:52:18Z

I think the code looks a lot cleaner. Thanks for this.

But I'm not sure about the degradation in FCFS. 5% is quite a huge margin...

Thanks! Glad the cleanup looks better.

Regarding the FCFS degradation — while it’s around 5%, we’re still seeing ~18k ops/sec, which is far above the actual bottleneck introduced by vLLM. I don’t think it’s worth optimizing something that already performs well beyond what the system can realistically handle. It feels like premature optimization at this point.

WoosukKwon · 2025-06-22T23:44:24Z

vllm/v1/core/sched/request_queue.py

+
+    def peek_request(self) -> Request:
+        """Peek at the next request in the queue without removing it."""
+        if not self:


Suggested change

if not self:

if len(self) == 0:

WoosukKwon

@amitm02 Thanks for all the fixes! Overall, I think the PR is in a good enough shape to be merged. I have small concerns on the overhead of finish_request and preemption, but I think it's acceptable (and we don't have a good idea at hand right now).

One last issue is the performance overheads in the default FCFS setting. I found that this PR does cause ~1% perf degradation in the sharegpt benchmark, but the overhead is eliminated by the following change:

class FCFSRequestQueue(deque[Request], RequestQueue):
    """A first-come-first-served queue that supports deque operations."""

    # Avoid the Python function call overheads by directly calling the parent
    # methods.
    add_request = deque.append
    pop_request = deque.popleft
    prepend_request = deque.appendleft
    prepend_requests = deque.extendleft
    remove_request = deque.remove
    __len__ = deque.__len__
    __iter__ = deque.__iter__
    __reversed__ = deque.__reversed__

    def __bool__(self) -> bool:
        """Check if queue has any requests."""
        return len(self) > 0

    def peek_request(self) -> Request:
        """Peek at the next request in the queue without removing it."""
        if len(self) == 0:
            raise IndexError("peek from an empty queue")
        return self[0]

    def remove_requests(self, requests: Iterable[Request]) -> None:
        """Remove multiple specific requests from the queue."""
        requests_to_remove = set(requests)
        filtered_requests = [
            req for req in self if req not in requests_to_remove
        ]
        # deque does not support in-place filtering, so we need to clear
        # and extend
        self.clear()
        self.extend(filtered_requests)

The code doesn't look so good, but we can fix it later after we introduce async scheduling.

…#19057) Signed-off-by: amit <[email protected]> Co-authored-by: Roger Wang <[email protected]> Signed-off-by: juncheoll <[email protected]>

…#19057) Signed-off-by: amit <[email protected]> Co-authored-by: Roger Wang <[email protected]> Signed-off-by: fhl <[email protected]>

V1 support of priority shedualing

3c7d198

Signed-off-by: amit <[email protected]>

amitm02 requested review from hmellor, WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners June 3, 2025 06:34

mergify bot added documentation Improvements or additions to documentation v1 labels Jun 3, 2025

This was referenced Jun 3, 2025

[Core] feat: Implement Priority Scheduling in V1 Engine #18700

Closed

[Feature]: Implement Priority Scheduling In V1 Engine #14002

Closed

[RFC]: Deprecating vLLM V0 #18571

Open

hmellor reviewed Jun 3, 2025

View reviewed changes

docs/usage/v1_guide.md Outdated Show resolved Hide resolved

amitm02 added 3 commits June 3, 2025 18:28

Merge remote-tracking branch 'upstream/main' into v1-priority-schedular

bae0c39

style(docs): split long line and wrap paragraph in note admonition

4b2e513

Signed-off-by: amit <[email protected]>

style(docs): fix pymarkdown error

419de95

Signed-off-by: amit <[email protected]>

mergify bot added the needs-rebase label Jun 4, 2025

aarnphm requested changes Jun 5, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into v1-priority-schedular

de76b7e

Signed-off-by: amit <[email protected]>

amitm02 requested review from jeejeelee, mgoin, youkaichao, russellb, DarkLight1337, LiuXiaoxuanPKU, KuntaiDu and tlrmchlsmth as code owners June 8, 2025 07:00

amitm02 added 4 commits June 19, 2025 13:19

Merge remote-tracking branch 'upstream/main' into v1-priority-schedular

815da9f

use in prepend_request(s) to support requests with priorities of nega…

1499053

…tive value Signed-off-by: amit <[email protected]>

minor change in comment

c84af1e

Signed-off-by: amit <[email protected]>

restore client_index=request.client_index

e1f7e78

Signed-off-by: amit <[email protected]>

amitm02 requested review from ywang96 and WoosukKwon June 19, 2025 16:01

amitm02 added 2 commits June 19, 2025 20:24

Merge remote-tracking branch 'upstream/main' into v1-priority-schedular

fc3719c

do not alter prioirty in PriorityRequestQueue prepend_request(s) func…

9aa2aca

…tion Signed-off-by: amit <[email protected]>

amitm02 requested a review from njhill June 19, 2025 17:32

aarnphm reviewed Jun 19, 2025

View reviewed changes

amitm02 added 5 commits June 20, 2025 11:40

Merge remote-tracking branch 'upstream/main' into v1-priority-schedular

2784e8f

minor style changes

f6fe731

Signed-off-by: amit <[email protected]>

minor style changes

dd63a03

Signed-off-by: amit <[email protected]>

use enum for policy parameter in create_request_queue

9b10818

Signed-off-by: amit <[email protected]>

use enum for policy parameter in create_request_queue pt. 2

38cdbd5

Signed-off-by: amit <[email protected]>

amitm02 requested a review from aarnphm June 20, 2025 09:44

amitm02 added 2 commits June 20, 2025 17:12

Merge remote-tracking branch 'upstream/main' into v1-priority-schedular

0a6d6b2

Merge remote-tracking branch 'upstream/main' into v1-priority-schedular

a2601b5

amitm02 requested a review from hmellor June 22, 2025 15:35

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 22, 2025

WoosukKwon reviewed Jun 22, 2025

View reviewed changes

WoosukKwon reviewed Jun 23, 2025

View reviewed changes

WoosukKwon approved these changes Jun 23, 2025

View reviewed changes

WoosukKwon merged commit 4a0f788 into vllm-project:main Jun 23, 2025
76 of 77 checks passed

github-project-automation bot moved this to Done in Tool Calling Jun 23, 2025

github-project-automation bot moved this to Done in Structured Output Jun 23, 2025

amitm02 mentioned this pull request Jun 24, 2025

[Fix][V1] Remove --scheduling-policy oracle #20010

Merged

Uh oh!

[Core] feat: Implement Priority Scheduling in V1 Engine #19057

[Core] feat: Implement Priority Scheduling in V1 Engine #19057

Uh oh!

Conversation

amitm02 commented Jun 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

Uh oh!

mergify bot commented Jun 4, 2025

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aarnphm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amitm02 commented Jun 20, 2025

Uh oh!

WoosukKwon Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

amitm02 commented Jun 3, 2025 •

edited by github-actions bot

Loading

aarnphm left a comment •

edited

Loading

WoosukKwon left a comment •

edited

Loading