Eval bug: OpenAI incompatible image handling in server multimodal #12947

kerlion · 2025-04-15T01:09:45Z

Name and Version

$ llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA RTX A6000, compute capability 8.6, VMM: yes
version: 5129 (526739b)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

NVIDIA RTX A6000, compute capability 8.6, VMM: yes

Models

Llama-4-Scout-17B-16E-Instruct

Problem description & steps to reproduce

When I invoke OpenAI api with image, it got 500 error.

First Bad Commit

500: Failed to parse messages: Unsupported content part type: "image_url"; messages = [
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "PLS  desc this pic?"
      },
      {
        "type": "image_url",
        "image_url": {
          "url": "data:image/png;base64,iVBORw0KGgoxxxxxxxxTkSuQmCC"
        }
      }
    ]
  }
]

Relevant log output

got exception: {"code":500,"message":"Failed to parse messages: Unsupported content part type: \"image_url\"; messages = [\n  {\n    \"role\": \"user\",\n    \"content\": [\n      {\n        \"type\": \"text\",\n        \"text\": \"PLS  desc this pic?\"\n      },\n      {\n        \"type\": \"image_url\",\n        \"image_url\": {\n          \"url\": \"data:image/xxxxx\"\n        }\n      }\n    ]\n  }\n]","type":"server_error"}
srv  log_server_r: request: POST /v1/chat/completions 10.13.23.105 500

The text was updated successfully, but these errors were encountered:

betweenus · 2025-04-15T12:25:01Z

Hi. llama-server supports only text input.

Fr0d0Beutl1n · 2025-04-19T09:41:08Z

Then what is image_data? It sounds very similar, if not the same:

image_data: An array of objects to hold base64-encoded image data and its ids to be reference in prompt. You can determine the place of the image in the prompt as in the following: USER:[img-12]Describe the image in detail.\nASSISTANT:. In this case, [img-12] will be replaced by the embeddings of the image with id 12 in the following image_data array: {..., "image_data": [{"data": "<BASE64_STRING>", "id": 12}]}. Use image_data only with multimodal models, e.g., LLaVA.

https://github.com/ggml-org/llama.cpp/blob/master/examples/server/README.md#post-completion-given-a-prompt-it-returns-the-predicted-completion

kerlion added the bug-unconfirmed label Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: OpenAI incompatible image handling in server multimodal #12947

Eval bug: OpenAI incompatible image handling in server multimodal #12947

kerlion commented Apr 15, 2025

betweenus commented Apr 15, 2025

Fr0d0Beutl1n commented Apr 19, 2025 •

edited

Loading

Eval bug: OpenAI incompatible image handling in server multimodal #12947

Eval bug: OpenAI incompatible image handling in server multimodal #12947

Comments

kerlion commented Apr 15, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

betweenus commented Apr 15, 2025

Fr0d0Beutl1n commented Apr 19, 2025 • edited Loading

Fr0d0Beutl1n commented Apr 19, 2025 •

edited

Loading