[Model][2/N] Automatic conversion of CrossEncoding model #19978

noooop · 2025-06-23T09:37:27Z

Score should be an api rather than a task, so let's get rid of score task

score task

Defining of score task in vllm documentation

LLM.score¶
The score method outputs similarity scores between sentence pairs. It is designed for embedding models and cross encoder models. Embedding models use cosine similarity, and cross-encoder models serve as rerankers between candidate query-document pairs in RAG systems.

In the v0.9.1 documentation, the score task supports the following models:

Architecture	Models	Example HF Models
BertForSequenceClassification	BERT-based	cross-encoder/ms-marco-MiniLM-L-6-v2, etc.
RobertaForSequenceClassification	RoBERTa-based	cross-encoder/quora-roberta-base, etc.
XLMRobertaForSequenceClassification	XLM-RoBERTa-based	BAAI/bge-reranker-v2-m3, etc.

score task only supports num_labels == 1

vllm/vllm/outputs.py

Lines 482 to 500 in 5111642

    
           @dataclass 
        
           class ScoringOutput: 
        
               """The output data of one scoring output of a request. 
        
               Args: 
        
                   score: The similarity score, which is a scalar value. 
        
               """ 
        
               score: float 
        
               @staticmethod 
        
               def from_base(pooling_output: PoolingOutput): 
        
                   pooled_data = pooling_output.data 
        
                   if pooled_data.ndim != 0: 
        
                       raise ValueError("pooled_data should be a scalar score") 
        
                   return ScoringOutput(pooled_data.item()) 
        
               def __repr__(self) -> str: 
        
                   return f"ScoringOutput(score={self.score})"

Summary

score task == embed task + BertLikeModelForSequenceClassification & num_labels == 1 ?

classify task

Defining of classify task in vllm documentation

LLM.classify¶
The classify method outputs a probability vector for each prompt. It is primarily designed for classification models.

In the v0.9.1 documentation, the score task supports the following models:

Architecture	Models	Example HF Models	LoRA	PP
JambaForSequenceClassification	Jamba	ai21labs/Jamba-tiny-reward-dev, etc.	✅︎	✅︎

If your model is not in the above list, we will try to automatically convert the model using as_classification_model. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.

e.g.

Qwen2ForSequenceClassification: jason9693/Qwen2.5-1.5B-apeach

Summary

classify task == *ForSequenceClassification?

More confusing error messages

classify task is not supported BertLikeModelForSequenceClassification & num_labels == 1

from vllm import LLM

model = LLM(model="BAAI/bge-reranker-base", task="classify")

text_1 = "ping"
text_2 = "pong"

outputs = model.classify([f'{text_1}</s></s>{text_2}'])

print(outputs)

# ValueError: pooled_data should be a 1-D probability vector

This make users wonder whether BertLikeModelForSequenceClassification & num_labels == 1 can only be used for score tasks??

BertLikeModelForSequenceClassification defaults to using score tasks, and an error is reported when num_labels > 1.


from vllm import LLM


model = LLM(model="SamLowe/roberta-base-go_emotions")

outputs = model.classify(["I am not having a great day"])

print(outputs)

ValueError: Classification API is only enabled for `--task classify`

score api

score api (outputs similarity scores between sentence pairs) is very useful, we should keep it.

Score should be an api rather than a task, so let's get rid of score task

When task_option == "embed" or *ForSequenceClassification & num_labels == 1, then allows users to use the score API.

For compatibility, we still allow users to use --task score.

if self.registry.is_cross_encoder_model(architectures):
    task_option = "classify"
else:
    task_option = "embed"

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

this pr is to address some complex issues found when dealing with #19675

Score should be an api rather than a task, so let's get rid of score task
+as_seq_cls_model, make the classification model work correctly
hitchhike
- fix test_mteb_score randomness
  It turned out that it was caused by PyStemmer resulting in slightly different stop words, leading to slightly different bm25 results.

PyStemmer==2.2.0.2 0.32082
PyStemmer-2.2.0.3 0.32082
PyStemmer-3.0.0 0.32102

Test Plan

pytest tests/test_config.py::test_auto_task
pytest tests/test_config.py::test_score_task

Test Result

Manage to keep all testing happy.

(Optional) Documentation Update

Document everything, when the journey ends.

Known issues

Cannot automatically run as_seq_cls_model, otherwise it will cause a circular reference on is_cross_encoder_model.
verify_and_update_config fix_by [Model][1/N] Automatic conversion of CrossEncoding model #20012

python examples/offline_inference/qwen3_reranker.py
# ValueError: Score API is only enabled for num_labels == 1.

gemini-code-assist

Summary of Changes

Hello @noooop, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request represents the first part of a series aimed at improving the automatic conversion of models, specifically focusing on cross-encoding and scoring capabilities. It involves a significant refactor of the classification adapter, unifying the 'score' task under 'classify' for cross-encoder models, and enhancing the underlying pooling logic. These changes contribute to a more robust and consistent framework for handling various pooling-based model tasks.

Highlights

Refactoring and Renaming: The as_classification_model adapter function has been renamed to as_seq_cls_model to better reflect its broader applicability to both sequence classification and scoring tasks. This change is reflected across documentation, tests, and core logic.
Task Unification and Re-mapping: The score task is being unified under the classify task for cross-encoder models. For other models, if score is specified, it will now resolve to embed. This streamlines task handling within the system's configuration.
Enhanced Adapter Logic: The as_seq_cls_model adapter has been significantly refactored. It now includes a dedicated pooler method that centralizes the logic for applying the classification/scoring head and handles different pooling types. The adapted models are also now explicitly marked as implementing SupportsCrossEncoding.
Qwen2 Integration: The Qwen2 model is now explicitly integrated with the new as_seq_cls_model adapter, allowing it to be automatically converted for sequence classification tasks.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2025-06-23T09:39:08Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist · 2025-06-23T09:40:33Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

noooop · 2025-06-23T10:53:32Z

@DarkLight1337

Does this work for you?

Score should be an api rather than a task, so let's get rid of score task

When task_option == "embed" or *ForSequenceClassification & num_labels == 1, then allows users to use the score API.

For compatibility, we still allow users to use --task score.

if self.registry.is_cross_encoder_model(architectures):
    task_option = "classify"
else:
    task_option = "embed"

DarkLight1337 · 2025-06-23T12:30:36Z

I am ok with this change. How about @maxdebayser @22quinn ?

maxdebayser · 2025-06-23T13:43:29Z

I think it makes total sense to enable the score API for embedding models. But are we going to disable the /classify API for cross encoder models? For reference, the sentence-transformers doesn't allow you to call CrossEncoder.predict() without text pairs. You can do so with the transfromers library, but the results are meaningless.

noooop · 2025-06-23T13:55:35Z

@maxdebayser

this pr is to address some complex issues found when dealing with #19675

from vllm import LLM

model = LLM(model="BAAI/bge-reranker-base", task="score")

text_1 = "ping"
text_2 = "pong"

outputs = model.score(text_1, text_2)


print(outputs)


# [ScoringRequestOutput(request_id='0', outputs=ScoringOutput(score=0.77197265625), prompt_token_ids=[0, 33429, 2, 2, 114007, 2], finished=True)]


from vllm import LLM

model = LLM(model="BAAI/bge-reranker-base", task="classify")

text_1 = "ping"
text_2 = "pong"

# after changing the output dimensions slightly
outputs = model.classify([f'{text_1}</s></s>{text_2}'])

print(outputs)

# [ClassificationRequestOutput(request_id='0', outputs=ClassificationOutput(num_classes=1), prompt_token_ids=[0, 33429, 2, 2, 114007, 2], finished=True)]

If strings are properly concatenated, the score and classify APIs can produce exactly the same results, so we should allow users to use the classify API, especially for LLM as Reranker models, where the classify API is more flexible.

And actually, the architectures of cross-encoder (reranker) models and Classification models are exactly the same, both being *ForSequenceClassification; we cannot distinguish them.

maxdebayser · 2025-06-23T14:06:47Z

If strings are properly concatenated, the score and classify APIs can produce exactly the same results, so we should allow users to use the classify API, especially for LLM as Reranker models, where the classify API is more flexible.

Not in the case of cross-encoder/ms-marco-MiniLM-L-6-v2 for example, because the tokenizer in this case produces token_type_ids along with the input_ids and position_ids (this is only supported in V0 for now, I'm working on V1).
If you just concatenate the sentences prior to giving them to the model, all token_type_ids will be 0, while they should be 0 for the first text and 1 for the second text.
But really it's very few models that have this behavior. I think it boils down to how prescriptive we want to be over what the users can do. Printing a warning could be enough.

noooop · 2025-06-23T14:50:54Z

Thank you for pointing out that position_ids, the cross-encoder model, and the Classification model may be different. I’ll see if I can find a flag to make a judgment. However, many models use the same base model for fine-tuning both the cross-encoder and the Classification model. It might really be impossible to tell them apart.

maxdebayser · 2025-06-23T14:53:44Z

Agreed, perhaps we'll just have to trust that the users know what they are doing.

noooop · 2025-06-23T15:22:28Z

Perhaps we should choose a more appropriate task name than “classify”. But, this change may hard to compatible.

Signed-off-by: wang.yuqi <[email protected]>

noooop · 2025-06-24T06:35:20Z

@DarkLight1337

Ready for review

maxdebayser

LGTM

22quinn · 2025-06-24T20:27:16Z

This feels the right direction. I think we can actually consolidate all these:
"pooling": ["embed", "classify", "score", "reward"],
We can have a single pooling task and the rest should be treated as APIs, since those are simply different processing of hidden states.

DarkLight1337 · 2025-06-25T02:21:24Z

The different --task options for pooling models were originally intended as presets so the user doesn't have to manually configure the pooler each time. Looks like that now causes some conflicts when trying to automatically converting the model.

noooop · 2025-06-25T02:36:16Z

"embed", "classify", "reward", none of them intersect

The main issue arises from the overlap between "score" and "embed", "classify".

gemini-code-assist bot reviewed Jun 23, 2025

View reviewed changes

mergify bot added documentation Improvements or additions to documentation qwen Related to Qwen models labels Jun 23, 2025

noooop changed the title ~~[Model][1/N] Automatic conversion of score (CrossEncoding) model. Part 1~~ [Model][1/N] Automatic conversion of CrossEncoding model. Part 1 Jun 23, 2025

noooop added 2 commits June 24, 2025 14:20

rm score task

4cf6b40

Signed-off-by: wang.yuqi <[email protected]>

Score API

10c02cd

Signed-off-by: wang.yuqi <[email protected]>

noooop force-pushed the resolve_task branch from abfc0e6 to 10c02cd Compare June 24, 2025 06:20

mergify bot added the frontend label Jun 24, 2025

noooop marked this pull request as ready for review June 24, 2025 06:20

noooop requested review from hmellor, DarkLight1337, ywang96, robertgshaw2-redhat, simon-mo and aarnphm as code owners June 24, 2025 06:20

noooop changed the title ~~[Model][1/N] Automatic conversion of CrossEncoding model. Part 1~~ [Model][2/N] Automatic conversion of CrossEncoding model. Part 2 Jun 24, 2025

maxdebayser approved these changes Jun 24, 2025

View reviewed changes

aarnphm changed the title ~~[Model][2/N] Automatic conversion of CrossEncoding model. Part 2~~ [Model][2/N] Automatic conversion of CrossEncoding model Jun 25, 2025

	@dataclass
	class ScoringOutput:
	"""The output data of one scoring output of a request.

	Args:
	score: The similarity score, which is a scalar value.
	"""
	score: float

	@staticmethod
	def from_base(pooling_output: PoolingOutput):
	pooled_data = pooling_output.data
	if pooled_data.ndim != 0:
	raise ValueError("pooled_data should be a scalar score")

	return ScoringOutput(pooled_data.item())

	def __repr__(self) -> str:
	return f"ScoringOutput(score={self.score})"

Uh oh!

[Model][2/N] Automatic conversion of CrossEncoding model #19978

Are you sure you want to change the base?

[Model][2/N] Automatic conversion of CrossEncoding model #19978

Conversation

noooop commented Jun 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Score should be an api rather than a task, so let's get rid of score task

score task

classify task

More confusing error messages

score api

Score should be an api rather than a task, so let's get rid of score task

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Known issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

gemini-code-assist bot commented Jun 23, 2025

Uh oh!

noooop commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Score should be an api rather than a task, so let's get rid of score task

Uh oh!

DarkLight1337 commented Jun 23, 2025

Uh oh!

maxdebayser commented Jun 23, 2025

Uh oh!

noooop commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxdebayser commented Jun 23, 2025

Uh oh!

noooop commented Jun 23, 2025

Uh oh!

maxdebayser commented Jun 23, 2025

Uh oh!

noooop commented Jun 23, 2025

Uh oh!

noooop commented Jun 24, 2025

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

22quinn commented Jun 24, 2025

Uh oh!

DarkLight1337 commented Jun 25, 2025

Uh oh!

noooop commented Jun 25, 2025

Uh oh!

Uh oh!

noooop commented Jun 23, 2025 •

edited by github-actions bot

Loading

noooop commented Jun 23, 2025 •

edited

Loading

noooop commented Jun 23, 2025 •

edited

Loading