-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[Usage]: How can I quickly obtain the number of prompt tokens containing multimodal data? #16191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It's not possible yet. Help is welcome! |
We need to call the processor from the API server in order to get the multimodal tokens. |
I can try fix this issue. |
+1 Would love to have this feature |
import requests
text = "hello world!"
model_name = "Qwen/Qwen2.5-VL-7B-Instruct"
# model_name = ""
url = "http://localhost:8000/tokenize"
data = {"model": model_name, "prompt": text}
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
messages = [{
"role":
"user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": image_url
},
},
],
}]
data = {"model": model_name, "messages": messages}
response = requests.post(url, json=data)
result = response.json()
print("Token IDs:", result["tokens"])
print("Token count:", result["count"])
output: Token IDs: [151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 3838, 594, 304, 419, 2168, 30, 151652, 151655, 151653, 151645, 198, 151644, 77091, 198]
Token count: 28 I have a test case, as shown above. I understand that the generated token IDs contain placeholders like Hi, @DarkLight1337 I need some help. I'm not sure if I understand this correctly, but it seems that the patch embedding of the image itself doesn't have token IDs — the embedding is processed inside the model. How should I handle this? |
Yes, that's why I said you need to apply the multi-modal processor instead of just the tokenizer. |
Your current environment
How would you like to use vllm
The /tokenize API can only return the number of prompt tokens that contain text and multimodal placeholders, but cannot return the actual number of prompt tokens. @DarkLight1337
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: