Skip to content

Commit af295e9

Browse files
[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (vllm-project#14609)
Signed-off-by: DarkLight1337 <[email protected]>
1 parent a1c8f37 commit af295e9

File tree

2 files changed

+6
-9
lines changed

2 files changed

+6
-9
lines changed

docs/source/models/supported_models.md

+4-7
Original file line numberDiff line numberDiff line change
@@ -541,14 +541,11 @@ You should manually set mean pooling by passing `--override-pooler-config '{"poo
541541
:::
542542

543543
:::{note}
544-
Unlike base Qwen2, `Alibaba-NLP/gte-Qwen2-7B-instruct` uses bi-directional attention.
545-
You can set `--hf-overrides '{"is_causal": false}'` to change the attention mask accordingly.
544+
The HF implementation of `Alibaba-NLP/gte-Qwen2-1.5B-instruct` is hardcoded to use causal attention despite what is shown in `config.json`. To compare vLLM vs HF results,
545+
you should set `--hf-overrides '{"is_causal": true}'` in vLLM so that the two implementations are consistent with each other.
546546

547-
On the other hand, its 1.5B variant (`Alibaba-NLP/gte-Qwen2-1.5B-instruct`) uses causal attention
548-
despite being described otherwise on its model card.
549-
550-
Regardless of the variant, you need to enable `--trust-remote-code` for the correct tokenizer to be
551-
loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
547+
For both the 1.5B and 7B variants, you also need to enable `--trust-remote-code` for the correct tokenizer to be loaded.
548+
See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
552549
:::
553550

554551
If your model is not in the above list, we will try to automatically convert the model using

tests/models/embedding/language/test_embedding.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,8 @@ def test_models(
4242
if model == "ssmits/Qwen2-7B-Instruct-embed-base":
4343
vllm_extra_kwargs["override_pooler_config"] = \
4444
PoolerConfig(pooling_type="MEAN")
45-
if model == "Alibaba-NLP/gte-Qwen2-7B-instruct":
46-
vllm_extra_kwargs["hf_overrides"] = {"is_causal": False}
45+
if model == "Alibaba-NLP/gte-Qwen2-1.5B-instruct":
46+
vllm_extra_kwargs["hf_overrides"] = {"is_causal": True}
4747

4848
# The example_prompts has ending "\n", for example:
4949
# "Write a short story about a robot that dreams for the first time.\n"

0 commit comments

Comments
 (0)