-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
[Model] Automatic conversion of score (CrossEncoding) models #19675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @noooop, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request refactors the automatic model conversion mechanism to support both classification and scoring tasks under a unified as_seq_cls_model
adapter. It updates the model loading utility to recognize the score
task and applies the generic adapter. The adapter itself is refactored to handle pooling and scoring more flexibly, and the Qwen3 sequence classification model is updated to utilize this new generic adapter.
Highlights
- Score Task Support: The automatic model conversion logic in the model loader (
vllm/model_executor/model_loader/utils.py
) is updated to use theas_seq_cls_model
adapter for models specified with thescore
task, in addition to the existingclassify
task. - Adapter Refactoring: The
as_seq_cls_model
adapter invllm/model_executor/models/adapters.py
is refactored. The pooling and scoring logic is moved into a dedicatedpooler
method, allowing for more flexible handling of different pooling types within the adapter. It also adds a check to squeeze the output dimension for thescore
task. - Qwen3 Reranker Integration: The specific
Qwen3ForSequenceClassification
implementation is updated to inherit from the new genericas_seq_cls_model
adapter, simplifying its structure and leveraging the shared adapter logic. Specific Qwen3 reranker configuration verification is moved into a newconfig_verify
method.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request focuses on enabling automatic conversion of score models by adapting the sequence classification model functionality. Key changes include:
- Renaming
as_classification_model
toas_seq_cls_model
and updating its functionality to support both "classify" and "score" tasks. This is consistently applied across documentation, tests, and model loading utilities. - Refactoring
Qwen3ForSequenceClassification
to leverage the newas_seq_cls_model
adapter. This promotes code reuse and centralizes the classification/scoring logic. - Introducing a
config_verify
method in the adapter pattern, allowing model-specific configurations, which is well-utilized byQwen3ForSequenceClassification
for its reranker variant. - Ensuring that for "score" tasks, the model expects
num_labels == 1
and the output is appropriately processed (squeezed).
The changes appear robust and improve the model adaptation framework. One area for potential clarification is the behavior of PoolingType.ALL
within the as_seq_cls_model
adapter, as noted in the specific comment.
Please also consider filling out the checklist in the PR description (Purpose, Test Plan, Test Result) for completeness.
6dc55ba
to
00d377b
Compare
aa22cad
to
71b1df4
Compare
This pull request has merge conflicts that must be resolved before it can be |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Ready for review |
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
_ModelRegistry.is_cross_encoder_model not considered as_seq_cls_model It seems I need to spend some more time fixing it. |
Signed-off-by: wang.yuqi <[email protected]>
I find it difficult to fully fix is_cross_encoder, and I am not familiar with this part of code. So I will use a temporary solution to fix it and document in Known issues.
how did this issues occur
In vllm/model_executor/models/registry.py, it is routed to GemmaForCausalLM.
But _ModelRegistry.is_cross_encoder_model not considered as_seq_cls_model so it derived that GemmaForSequenceClassification is_cross_encoder_model == False, which leads to a wrong calculation method
In vllm/model_executor/models/registry.py, it is routed to Qwen3ForSequenceClassification.
so it derived that GemmaForSequenceClassification is_cross_encoder_model == True
Add the following code in vllm/model_executor/models/registry.py
This is clearly too tedious. |
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
@@ -176,7 +178,8 @@ def as_classification_model(cls: _T) -> _T: | |||
default_softmax=True, | |||
) | |||
|
|||
class ModelForClassification(ModelForPooling): | |||
class ModelForSequenceClassification(ModelForPooling, | |||
SupportsCrossEncoding): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't is_cross_encoder_model
return true for this model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the code block was not run in ModelRegistry.is_cross_encoder_model
model_cls, arch = ModelRegistry.resolve_model_cls(architectures)
if model_config.task == "embed":
model_cls = as_embedding_model(model_cls)
elif model_config.task in ["classify", "score"]:
model_cls = as_seq_cls_model(model_cls)
elif model_config.task == "reward":
model_cls = as_reward_model(model_cls)
_ModelRegistry.is_cross_encoder_model not considered as_seq_cls_model
TL;DR
(Last week st scored 0.33437, today it's 0.33702. (╯‵□′)╯︵┻━┻. )
It turned out that it was caused by PyStemmer resulting in slightly different stop words, leading to slightly different bm25 results.
Hope that after merging this pr, vllm can support more llms using the relevance generation method as classifiers and rerankers.
Usage
converting2seq_cls_models.py:
PTAL #19260
Caution
"Yes" and "yes" are two different tokens
requests demo + formating query & document:
expected output
If someone wants to implement an offline conversion from ForCausalLM to ForSequenceClassification support new methods or new models, please refer to
https://github.com/noooop/snippet/tree/main/converting2SequenceClassification
(I don't know where to place this code in vllm.)
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Follow-up #11469, Further improve #10674
Test Plan
Test Result
(Optional) Documentation Update
Known issues
vllm/vllm/model_executor/models/registry.py
Lines 160 to 161 in 6bc7b57
In vllm, the score task only supports num_labels == 1, while models with num_labels == 1 in sentence-transformers use Sigmoid by default.
https://github.com/UKPLab/sentence-transformers/blob/910ed144dfc0a08f31517b0d01580302015fa408/sentence_transformers/cross_encoder/CrossEncoder.py#L485-L487
Perhaps we should update the documentation to set default_softmax=True when the task is score, consistent with the implementation in sentence-transformers. And we should pin the sentence-transformers version to >= 4.1.0.
change verify_and_update_config into a class method in the future and call it when initializing model_config.
Template aware prompt truncation to avoid cutting off important instructions.
Alibaba-NLP/gte-Qwen2-1.5B-instruct & Alibaba-NLP/gte-modernbert-base
NotImplementedError: Encoder self-attention and encoder/decoder cross-attention are not implemented for FlashAttentionImpl
Fix #19673