You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/models/supported_models.md
+4-7
Original file line number
Diff line number
Diff line change
@@ -541,14 +541,11 @@ You should manually set mean pooling by passing `--override-pooler-config '{"poo
541
541
:::
542
542
543
543
:::{note}
544
-
Unlike base Qwen2, `Alibaba-NLP/gte-Qwen2-7B-instruct`uses bi-directional attention.
545
-
You can set `--hf-overrides '{"is_causal": false}'`to change the attention mask accordingly.
544
+
The HF implementation of `Alibaba-NLP/gte-Qwen2-1.5B-instruct`is hardcoded to use causal attention despite what is shown in `config.json`. To compare vLLM vs HF results,
545
+
you should set `--hf-overrides '{"is_causal": true}'`in vLLM so that the two implementations are consistent with each other.
546
546
547
-
On the other hand, its 1.5B variant (`Alibaba-NLP/gte-Qwen2-1.5B-instruct`) uses causal attention
548
-
despite being described otherwise on its model card.
549
-
550
-
Regardless of the variant, you need to enable `--trust-remote-code` for the correct tokenizer to be
551
-
loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
547
+
For both the 1.5B and 7B variants, you also need to enable `--trust-remote-code` for the correct tokenizer to be loaded.
548
+
See [relevant issue on HF Transformers](https://github.com/huggingface/transformers/issues/34882).
552
549
:::
553
550
554
551
If your model is not in the above list, we will try to automatically convert the model using
0 commit comments