vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 6.7k
Star 43.7k

Code
Issues 1.6k
Pull requests 554
Discussions
Actions
Projects 10
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm

Labels 45 Milestones 0

New pull request New

554 Open 7,338 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[MISC] Use less CPU when message queue has been empty for some time

#16226 opened Apr 8, 2025 by p12tic

Loading…

[Misc] Merge the logs of pp layers partitions

#16225 opened Apr 8, 2025 by kebe7jun

Loading…

[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding bug

Something isn't working

ci/build ready

ONLY add when PR is ready to merge/full CI is needed

#16221 opened Apr 8, 2025 by mgoin

Loading…

[Bugfix] Do not skip "empty" parts of chats that are parsable frontend multi-modality

Related to multi-modality (#4194)

ready

ONLY add when PR is ready to merge/full CI is needed

#16219 opened Apr 7, 2025 by mgoin

Loading…

[Model] set default attn tmp scaling to True for llama4

#16216 opened Apr 7, 2025 by luccafong • Draft

[Docs] Add Slides from Singapore Meetup documentation

Improvements or additions to documentation

#16213 opened Apr 7, 2025 by simon-mo

Loading…

Add warning for Attention backends that do not support irope yet ready

ONLY add when PR is ready to merge/full CI is needed

tpu

Related to Google TPUs

#16212 opened Apr 7, 2025 by sarckk

Loading…

[Hardware][Google] Track TPU usages in vLLM's data dashboards ready

ONLY add when PR is ready to merge/full CI is needed

#16211 opened Apr 7, 2025 by dyli-google

Loading…

[WIP][BugFix] potentially fix index error v1

#16209 opened Apr 7, 2025 by LucasWilkinson • Draft

[BugFix] Remove 224 from FA supported list ready

ONLY add when PR is ready to merge/full CI is needed

#16206 opened Apr 7, 2025 by LucasWilkinson

Loading…

[Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe

#16203 opened Apr 7, 2025 by lengrongfu

Loading…

[Bugfix] Fix profiling.py documentation

Improvements or additions to documentation

#16202 opened Apr 7, 2025 by hhy3

Loading…

[Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs

#16198 opened Apr 7, 2025 by tjtanaa

Loading…

[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version ci/build ready

ONLY add when PR is ready to merge/full CI is needed

#16194 opened Apr 7, 2025 by Isotr0py

Loading…

[WIP]: Support embedding models in V1 frontend tpu

Related to Google TPUs

#16188 opened Apr 7, 2025 by maxdebayser • Draft

[WIP] Hybrid Memory Allocator v1

#16178 opened Apr 7, 2025 by WoosukKwon • Draft

[Kernel] support cuda merge_attn_states kernel, max ~3x improved ci/build v1

#16173 opened Apr 7, 2025 by DefTruth • Draft

1 of 4 tasks

[TPU][V1] Disable per-request seed/Generator ci/build tpu

Related to Google TPUs

#16172 opened Apr 7, 2025 by NickLucche

Loading…

[V1][Structured Output] Add validate_grammar() method to StructuredOutputBackend v1

#16171 opened Apr 7, 2025 by shen-shanshan

Loading…

[model] fix load_weights function for dpsk fusedmoe class

#16170 opened Apr 7, 2025 by BearBiscuit05

Loading…

[Feature] Estimate max-model-len use available KV cache memory v1

#16168 opened Apr 7, 2025 by lengrongfu

Loading…

[Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 ready

ONLY add when PR is ready to merge/full CI is needed

#16161 opened Apr 7, 2025 by zxfan-cpu

Loading…

[Draft] SnapKV v1

#16160 opened Apr 7, 2025 by yeyang-zhou • Draft

[V1][Core] Add async kv cache offload ci/build v1

#16159 opened Apr 7, 2025 by zeroorhero

Loading…

[WIP] Proper input validation for multi-modal encoder-decoder models v1

#16156 opened Apr 7, 2025 by DarkLight1337 • Draft

Previous 1 2 3 4 5 … 22 23 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly