[Usage]: How can I quickly obtain the number of prompt tokens containing multimodal data? #16191

yansh97 · 2025-04-07T14:45:08Z

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

The /tokenize API can only return the number of prompt tokens that contain text and multimodal placeholders, but cannot return the actual number of prompt tokens. @DarkLight1337

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2025-04-07T14:46:25Z

It's not possible yet. Help is welcome!

DarkLight1337 · 2025-04-07T14:48:40Z

We need to call the processor from the API server in order to get the multimodal tokens.

chaunceyjiang · 2025-04-07T14:49:17Z

I can try fix this issue.

w013nad · 2025-04-07T15:43:01Z

+1 Would love to have this feature

chaunceyjiang · 2025-04-16T07:28:09Z

import requests

text = "hello world!"
model_name = "Qwen/Qwen2.5-VL-7B-Instruct"
# model_name = ""
url = "http://localhost:8000/tokenize"
data = {"model": model_name, "prompt": text}
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

messages = [{
    "role":
    "user",
    "content": [
        {
            "type": "text",
            "text": "What's in this image?"
        },
        {
            "type": "image_url",
            "image_url": {
                "url": image_url
            },
        },
    ],
}]

data = {"model": model_name, "messages": messages}

response = requests.post(url, json=data)
result = response.json()

print("Token IDs:", result["tokens"])
print("Token count:", result["count"])

output:

Token IDs: [151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 3838, 594, 304, 419, 2168, 30, 151652, 151655, 151653, 151645, 198, 151644, 77091, 198]
Token count: 28

I have a test case, as shown above. I understand that the generated token IDs contain placeholders like <image>, but do not include the actual image content. This issue requires the actual token IDs, i.e., including the image content itself, not just the placeholder.

Hi, @DarkLight1337 I need some help. I'm not sure if I understand this correctly, but it seems that the patch embedding of the image itself doesn't have token IDs — the embedding is processed inside the model.

How should I handle this?

DarkLight1337 · 2025-04-16T07:48:59Z

Yes, that's why I said you need to apply the multi-modal processor instead of just the tokenizer.

yansh97 added the usage How to use vllm label Apr 7, 2025

DarkLight1337 added the help wanted Extra attention is needed label Apr 7, 2025

DarkLight1337 added this to Multi-modality Core Apr 7, 2025

DarkLight1337 moved this to Planning in Multi-modality Core Apr 7, 2025

DarkLight1337 moved this from Planning to Todo in Multi-modality Core Apr 7, 2025

DarkLight1337 assigned chaunceyjiang Apr 7, 2025

DarkLight1337 added the multi-modality Related to multi-modality (#4194) label Apr 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: How can I quickly obtain the number of prompt tokens containing multimodal data? #16191

[Usage]: How can I quickly obtain the number of prompt tokens containing multimodal data? #16191

yansh97 commented Apr 7, 2025

DarkLight1337 commented Apr 7, 2025

DarkLight1337 commented Apr 7, 2025

chaunceyjiang commented Apr 7, 2025

w013nad commented Apr 7, 2025

chaunceyjiang commented Apr 16, 2025

DarkLight1337 commented Apr 16, 2025

[Usage]: How can I quickly obtain the number of prompt tokens containing multimodal data? #16191

[Usage]: How can I quickly obtain the number of prompt tokens containing multimodal data? #16191

Comments

yansh97 commented Apr 7, 2025

Your current environment

How would you like to use vllm

Before submitting a new issue...

DarkLight1337 commented Apr 7, 2025

DarkLight1337 commented Apr 7, 2025

chaunceyjiang commented Apr 7, 2025

w013nad commented Apr 7, 2025

chaunceyjiang commented Apr 16, 2025

DarkLight1337 commented Apr 16, 2025