You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Added `Pooling` support to Embedding Model
2. Added multiple seq_len support for the embedding model using
`QEffAutoModel`
4. Added test for pooling and multiple seq_len
---------
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Abukhoyer Shaik <[email protected]>
Co-authored-by: Abukhoyer Shaik <[email protected]>
This method serves as the easiest entry point into using QEfficient. The interface is designed to be similar to transformers.AutoModel.
171
177
Once the model is initialized, you can use other methods such as export, compile, and generate on the same object.
172
178
173
179
This API can also be used as exception for VLM model since transformers support loading InternChatVL models via AutoModel API we support it via AutoModelForCausalLM API
174
180
Args:
175
-
:pretrained_name_or_path (str): Model card name from HuggingFace or local path to model directory.
176
-
:args, kwargs: Additional arguments to pass to transformers.AutoModel.
181
+
pretrained_model_name_or_path (str): The name or path of the pre-trained model.
182
+
pooling (Optional[Union[str, Callable]], optional): The pooling method to use. Defaults to None.
183
+
Options:
184
+
- "mean": Mean pooling
185
+
- "max": Max pooling
186
+
- "cls": CLS token pooling
187
+
- "avg": Average pooling
188
+
- Callable: A custom pooling function
189
+
- None: No pooling applied
177
190
178
191
.. code-block:: python
179
192
180
193
from QEfficient import QEFFAutoModel
181
194
from transformers import AutoTokenizer
182
195
183
196
# Initialize the model using from_pretrained similar to transformers.AutoModel.
184
-
model = QEFFAutoModel.from_pretrained("model_name")
197
+
model = QEFFAutoModel.from_pretrained("model_name", pooling="mean")
185
198
186
199
# Now you can directly compile the model for Cloud AI 100
187
200
model.compile(num_cores=16) # Considering you have a Cloud AI 100 SKU
Apply a pooling transformation to the model. This transformation appends a pooling layer to the model, allowing for the reduction of spatial dimensions in the output.
534
+
The pooling layer can be configured to use different pooling methods, such as max pooling or average pooling.
0 commit comments