Skip to content

[Model][2/N] Automatic conversion of CrossEncoding model #19978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

noooop
Copy link
Contributor

@noooop noooop commented Jun 23, 2025

Score should be an api rather than a task, so let's get rid of score task

score task

  1. Defining of score task in vllm documentation

LLM.score
The score method outputs similarity scores between sentence pairs. It is designed for embedding models and cross encoder models. Embedding models use cosine similarity, and cross-encoder models serve as rerankers between candidate query-document pairs in RAG systems.

  1. In the v0.9.1 documentation, the score task supports the following models:
Architecture Models Example HF Models
BertForSequenceClassification BERT-based cross-encoder/ms-marco-MiniLM-L-6-v2, etc.
RobertaForSequenceClassification RoBERTa-based cross-encoder/quora-roberta-base, etc.
XLMRobertaForSequenceClassification XLM-RoBERTa-based BAAI/bge-reranker-v2-m3, etc.
  1. score task only supports num_labels == 1

vllm/vllm/outputs.py

Lines 482 to 500 in 5111642

@dataclass
class ScoringOutput:
"""The output data of one scoring output of a request.
Args:
score: The similarity score, which is a scalar value.
"""
score: float
@staticmethod
def from_base(pooling_output: PoolingOutput):
pooled_data = pooling_output.data
if pooled_data.ndim != 0:
raise ValueError("pooled_data should be a scalar score")
return ScoringOutput(pooled_data.item())
def __repr__(self) -> str:
return f"ScoringOutput(score={self.score})"

  1. Summary

score task == embed task + BertLikeModelForSequenceClassification & num_labels == 1 ?

classify task

  1. Defining of classify task in vllm documentation

LLM.classify
The classify method outputs a probability vector for each prompt. It is primarily designed for classification models.

  1. In the v0.9.1 documentation, the score task supports the following models:
Architecture Models Example HF Models LoRA PP
JambaForSequenceClassification Jamba ai21labs/Jamba-tiny-reward-dev, etc. ✅︎ ✅︎

If your model is not in the above list, we will try to automatically convert the model using as_classification_model. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.

e.g.

Qwen2ForSequenceClassification: jason9693/Qwen2.5-1.5B-apeach

  1. Summary

classify task == *ForSequenceClassification?

More confusing error messages

  1. classify task is not supported BertLikeModelForSequenceClassification & num_labels == 1
from vllm import LLM

model = LLM(model="BAAI/bge-reranker-base", task="classify")

text_1 = "ping"
text_2 = "pong"

outputs = model.classify([f'{text_1}</s></s>{text_2}'])

print(outputs)

# ValueError: pooled_data should be a 1-D probability vector

This make users wonder whether BertLikeModelForSequenceClassification & num_labels == 1 can only be used for score tasks??

  1. BertLikeModelForSequenceClassification defaults to using score tasks, and an error is reported when num_labels > 1.

from vllm import LLM


model = LLM(model="SamLowe/roberta-base-go_emotions")

outputs = model.classify(["I am not having a great day"])

print(outputs)

ValueError: Classification API is only enabled for `--task classify`

score api

score api (outputs similarity scores between sentence pairs) is very useful, we should keep it.

Score should be an api rather than a task, so let's get rid of score task

When task_option == "embed" or *ForSequenceClassification & num_labels == 1, then allows users to use the score API.

For compatibility, we still allow users to use --task score.

if self.registry.is_cross_encoder_model(architectures):
    task_option = "classify"
else:
    task_option = "embed"

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

this pr is to address some complex issues found when dealing with #19675

  • Score should be an api rather than a task, so let's get rid of score task
  • +as_seq_cls_model, make the classification model work correctly
  • hitchhike
    • fix test_mteb_score randomness
      It turned out that it was caused by PyStemmer resulting in slightly different stop words, leading to slightly different bm25 results.
PyStemmer==2.2.0.2 0.32082
PyStemmer-2.2.0.3 0.32082
PyStemmer-3.0.0 0.32102

Test Plan

pytest tests/test_config.py::test_auto_task
pytest tests/test_config.py::test_score_task

Test Result

Manage to keep all testing happy.

(Optional) Documentation Update

Document everything, when the journey ends.

Known issues

  1. Cannot automatically run as_seq_cls_model, otherwise it will cause a circular reference on is_cross_encoder_model.
  2. verify_and_update_config fix_by [Model][1/N] Automatic conversion of CrossEncoding model #20012
python examples/offline_inference/qwen3_reranker.py
# ValueError: Score API is only enabled for num_labels == 1.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @noooop, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents the first part of a series aimed at improving the automatic conversion of models, specifically focusing on cross-encoding and scoring capabilities. It involves a significant refactor of the classification adapter, unifying the 'score' task under 'classify' for cross-encoder models, and enhancing the underlying pooling logic. These changes contribute to a more robust and consistent framework for handling various pooling-based model tasks.

Highlights

  • Refactoring and Renaming: The as_classification_model adapter function has been renamed to as_seq_cls_model to better reflect its broader applicability to both sequence classification and scoring tasks. This change is reflected across documentation, tests, and core logic.
  • Task Unification and Re-mapping: The score task is being unified under the classify task for cross-encoder models. For other models, if score is specified, it will now resolve to embed. This streamlines task handling within the system's configuration.
  • Enhanced Adapter Logic: The as_seq_cls_model adapter has been significantly refactored. It now includes a dedicated pooler method that centralizes the logic for applying the classification/scoring head and handles different pooling types. The adapted models are also now explicitly marked as implementing SupportsCrossEncoding.
  • Qwen2 Integration: The Qwen2 model is now explicitly integrated with the new as_seq_cls_model adapter, allowing it to be automatically converted for sequence classification tasks.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added documentation Improvements or additions to documentation qwen Related to Qwen models labels Jun 23, 2025
@noooop noooop changed the title [Model][1/N] Automatic conversion of score (CrossEncoding) model. Part 1 [Model][1/N] Automatic conversion of CrossEncoding model. Part 1 Jun 23, 2025
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@noooop
Copy link
Contributor Author

noooop commented Jun 23, 2025

@DarkLight1337

Does this work for you?


Score should be an api rather than a task, so let's get rid of score task

When task_option == "embed" or *ForSequenceClassification & num_labels == 1, then allows users to use the score API.

For compatibility, we still allow users to use --task score.

if self.registry.is_cross_encoder_model(architectures):
    task_option = "classify"
else:
    task_option = "embed"

@DarkLight1337
Copy link
Member

I am ok with this change. How about @maxdebayser @22quinn ?

@maxdebayser
Copy link
Contributor

I think it makes total sense to enable the score API for embedding models. But are we going to disable the /classify API for cross encoder models? For reference, the sentence-transformers doesn't allow you to call CrossEncoder.predict() without text pairs. You can do so with the transfromers library, but the results are meaningless.

@noooop
Copy link
Contributor Author

noooop commented Jun 23, 2025

@maxdebayser

this pr is to address some complex issues found when dealing with #19675

from vllm import LLM

model = LLM(model="BAAI/bge-reranker-base", task="score")

text_1 = "ping"
text_2 = "pong"

outputs = model.score(text_1, text_2)


print(outputs)


# [ScoringRequestOutput(request_id='0', outputs=ScoringOutput(score=0.77197265625), prompt_token_ids=[0, 33429, 2, 2, 114007, 2], finished=True)]


from vllm import LLM

model = LLM(model="BAAI/bge-reranker-base", task="classify")

text_1 = "ping"
text_2 = "pong"

# after changing the output dimensions slightly
outputs = model.classify([f'{text_1}</s></s>{text_2}'])

print(outputs)

# [ClassificationRequestOutput(request_id='0', outputs=ClassificationOutput(num_classes=1), prompt_token_ids=[0, 33429, 2, 2, 114007, 2], finished=True)]

If strings are properly concatenated, the score and classify APIs can produce exactly the same results, so we should allow users to use the classify API, especially for LLM as Reranker models, where the classify API is more flexible.

And actually, the architectures of cross-encoder (reranker) models and Classification models are exactly the same, both being *ForSequenceClassification; we cannot distinguish them.

@maxdebayser
Copy link
Contributor

If strings are properly concatenated, the score and classify APIs can produce exactly the same results, so we should allow users to use the classify API, especially for LLM as Reranker models, where the classify API is more flexible.

Not in the case of cross-encoder/ms-marco-MiniLM-L-6-v2 for example, because the tokenizer in this case produces token_type_ids along with the input_ids and position_ids (this is only supported in V0 for now, I'm working on V1).
If you just concatenate the sentences prior to giving them to the model, all token_type_ids will be 0, while they should be 0 for the first text and 1 for the second text.
But really it's very few models that have this behavior. I think it boils down to how prescriptive we want to be over what the users can do. Printing a warning could be enough.

@noooop
Copy link
Contributor Author

noooop commented Jun 23, 2025

Thank you for pointing out that position_ids, the cross-encoder model, and the Classification model may be different. I’ll see if I can find a flag to make a judgment. However, many models use the same base model for fine-tuning both the cross-encoder and the Classification model. It might really be impossible to tell them apart.

@maxdebayser
Copy link
Contributor

Agreed, perhaps we'll just have to trust that the users know what they are doing.

@noooop
Copy link
Contributor Author

noooop commented Jun 23, 2025

Perhaps we should choose a more appropriate task name than “classify”. But, this change may hard to compatible.

noooop added 2 commits June 24, 2025 14:20
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
@mergify mergify bot added the frontend label Jun 24, 2025
@noooop noooop marked this pull request as ready for review June 24, 2025 06:20
@noooop
Copy link
Contributor Author

noooop commented Jun 24, 2025

@DarkLight1337

Ready for review

@noooop noooop changed the title [Model][1/N] Automatic conversion of CrossEncoding model. Part 1 [Model][2/N] Automatic conversion of CrossEncoding model. Part 2 Jun 24, 2025
Copy link
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@22quinn
Copy link
Collaborator

22quinn commented Jun 24, 2025

This feels the right direction. I think we can actually consolidate all these:
"pooling": ["embed", "classify", "score", "reward"],
We can have a single pooling task and the rest should be treated as APIs, since those are simply different processing of hidden states.

@DarkLight1337
Copy link
Member

The different --task options for pooling models were originally intended as presets so the user doesn't have to manually configure the pooler each time. Looks like that now causes some conflicts when trying to automatically converting the model.

@noooop
Copy link
Contributor Author

noooop commented Jun 25, 2025

"embed", "classify", "reward", none of them intersect

The main issue arises from the overlap between "score" and "embed", "classify".

@aarnphm aarnphm changed the title [Model][2/N] Automatic conversion of CrossEncoding model. Part 2 [Model][2/N] Automatic conversion of CrossEncoding model Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend qwen Related to Qwen models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants