Skip to content

[Usage]: Llama4 tool parser #16214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
dhruvmullick opened this issue Apr 7, 2025 · 5 comments
Closed
1 task done

[Usage]: Llama4 tool parser #16214

dhruvmullick opened this issue Apr 7, 2025 · 5 comments
Assignees
Labels
usage How to use vllm

Comments

@dhruvmullick
Copy link

Your current environment

Is there any particular parser we should use for parsing tool calls with Llama4?

Wondering if the Llama 3.1 parser is suitable here.

How would you like to use vllm

Use tools with Llama4

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@sangmandu
Copy link

222

@jhuntbach-bc
Copy link

jhuntbach-bc commented Apr 9, 2025

I tried using --enable-auto-tool-choice --tool-call-parser=llama3_json but I'm getting mixed results. Sometimes the tool calling will work, sometimes I'll just get a text content response like this: <|python_start|>{"name": "brave_web_search", "parameters": {"query": "latest news"}}<|python_end|>

@jhuntbach-bc
Copy link

I just tried setting --chat-template=examples/tool_chat_template_llama3.1_json.jinja too and that made no difference to the results I was seeing

@hansuelijud
Copy link

I've had luck with below adapted template and tool parser from llama 3.2. It's not perfect but it seems to work. Maybe it helps as a starting point? Note: I have not yet tested the chat template with images and it is incomplete for this case. It is likely to break there.

Tool parser:

import ast
import json
import re
import uuid 
from collections.abc import Sequence
from typing import Any, Dict, List, Optional, Tuple, Union

from transformers import PreTrainedTokenizerBase

from vllm.entrypoints.openai.protocol import (ChatCompletionRequest,
                                               DeltaFunctionCall, DeltaMessage,
                                               DeltaToolCall,
                                               ExtractedToolCallInformation,
                                               FunctionCall, ToolCall)
from vllm.entrypoints.openai.tool_parsers.abstract_tool_parser import (
    ToolParser, ToolParserManager)
from vllm.logger import init_logger

logger = init_logger(__name__)


# Helper exception
class _UnexpectedFormatError(Exception):
    pass


# Helper function to parse argument values from AST nodes
def _get_parameter_value(val: ast.expr) -> Any:

    if isinstance(val, ast.Constant): return val.value
    elif isinstance(val, ast.Dict):
        keys = []; values = []
        for k in val.keys:
             if isinstance(k, ast.Constant) and isinstance(k.value, str): keys.append(k.value)
             else: raise _UnexpectedFormatError(f"Dict keys must be strings. Got {type(k)}")
        for v in val.values: values.append(_get_parameter_value(v))
        return dict(zip(keys, values))
    elif isinstance(val, ast.List): return [_get_parameter_value(v) for v in val.elts]
    elif isinstance(val, ast.Tuple): return [_get_parameter_value(v) for v in val.elts] # Treat tuple as list
    elif isinstance(val, ast.UnaryOp) and isinstance(val.op, ast.USub) and isinstance(val.operand, ast.Constant) and isinstance(val.operand.value, (int, float)): return -val.operand.value
    else: raise _UnexpectedFormatError(f"Unsupported arg type {type(val)}: '{ast.dump(val)}'")


# Helper function to handle a single AST Call node (for func(arg=val) format)
def _handle_ast_call_node(call: ast.Call) -> ToolCall:

    if not isinstance(call.func, ast.Name): raise _UnexpectedFormatError("Tool call func must be Name")
    function_name = call.func.id
    arguments = {}
    for keyword in call.keywords:
        if keyword.arg is None: raise _UnexpectedFormatError("Args must be keyword args")
        try: arguments[keyword.arg] = _get_parameter_value(keyword.value)
        except _UnexpectedFormatError as e: raise _UnexpectedFormatError(f"Invalid arg '{keyword.arg}' in '{function_name}': {e}") from e
        except Exception as e: raise _UnexpectedFormatError(f"Error parsing arg '{keyword.arg}' in '{function_name}': {e}") from e
    tool_call_id = f"call_{uuid.uuid4()}"; arguments_json = json.dumps(arguments)
    return ToolCall(id=tool_call_id, type="function", function=FunctionCall(name=function_name, arguments=arguments_json))

# Helper function to handle the dictionary format {'type': 'function', ...}
def _handle_dict_format(call_dict: Dict[str, Any]) -> ToolCall:

    if not isinstance(call_dict, dict): raise _UnexpectedFormatError(f"Expected dict, got {type(call_dict)}")
    call_type = call_dict.get("type"); func_details = call_dict.get("function", call_dict)
    if call_type != "function": raise _UnexpectedFormatError(f"Expected type 'function', got '{call_type}'")
    func_name = func_details.get("name")
    if not isinstance(func_name, str): raise _UnexpectedFormatError(f"Expected string name, got {type(func_name)}")
    parameters = func_details.get("parameters", func_details.get("arguments"))
    if not isinstance(parameters, dict):
         if parameters is None: parameters = {}
         else: raise _UnexpectedFormatError(f"Expected dict parameters, got {type(parameters)}")
    arguments_json = json.dumps(parameters)
    tool_call_id = f"call_{uuid.uuid4()}"
    return ToolCall(id=tool_call_id, type="function", function=FunctionCall(name=func_name, arguments=arguments_json))



def _make_valid_python(text: str) -> Union[tuple[str, str], None]:

    bracket_stack = []; in_single_quotes = False; in_double_quotes = False; escaped = False
    for index, char in enumerate(text):
        if escaped: escaped = False; continue
        if char == '\\': escaped = True; continue
        if char == "'" and not in_double_quotes:
            if not in_single_quotes: bracket_stack.append("'"); in_single_quotes = True
            elif bracket_stack and bracket_stack[-1] == "'": bracket_stack.pop(); in_single_quotes = False
            else: return None
        elif char == '"' and not in_single_quotes:
            if not in_double_quotes: bracket_stack.append('"'); in_double_quotes = True
            elif bracket_stack and bracket_stack[-1] == '"': bracket_stack.pop(); in_double_quotes = False
            else: return None
        elif not in_single_quotes and not in_double_quotes:
            if char in {"[", "(", "{"}: bracket_stack.append(char)
            elif char == "]":
                if not bracket_stack or bracket_stack.pop() != "[": return None
            elif char == ")":
                if not bracket_stack or bracket_stack.pop() != "(": return None
            elif char == "}":
                if not bracket_stack or bracket_stack.pop() != "{": return None
    clean_text = text.rstrip(); added_text = ""
    if not clean_text: return None
    if clean_text.endswith(("=", ":", ",")): return None
    if re.search(r'\b(Tru|Fals|Non)$', clean_text): return None
    closing_map = {"[": "]", "(": ")", "{": "}", "'": "'", '"': '"'}
    for char in reversed(bracket_stack): added_text += closing_map.get(char, "")
    return clean_text + added_text, added_text



def _compute_tool_delta(previously_sent_args: str, new_call: ToolCall,
                        index: int) -> Union[DeltaToolCall, None]:

    new_call_args = new_call.function.arguments if new_call.function.arguments is not None else ""
    if not previously_sent_args:
        return DeltaToolCall(id=new_call.id, index=index, type="function",
                             function=DeltaFunctionCall(name=new_call.function.name, arguments=new_call_args))
    elif previously_sent_args == "{}" and new_call_args != "{}":
         return DeltaToolCall(index=index, function=DeltaFunctionCall(arguments=new_call_args))
    elif len(new_call_args) > len(previously_sent_args) and new_call_args.startswith(previously_sent_args):
        arg_diff = new_call_args[len(previously_sent_args):]
        if arg_diff: return DeltaToolCall(index=index, function=DeltaFunctionCall(arguments=arg_diff))
        else: return None
    elif new_call_args == previously_sent_args: return None
    else: logger.warning(f"Arg stream mismatch: New args '{new_call_args}' !startswith '{previously_sent_args}'"); return None


@ToolParserManager.register_module("llama4_pythonic")
class Llama4PythonicToolParser(ToolParser):
    """
    Tool call parser for Llama 4 models handling both Pythonic list formats,
    prioritizing <|python_start|> and <|python_end|> markers.
    Handles trailing semicolon ]; Handles single call without list brackets.
    """
    TOOL_CALL_START_MARKER = "<|python_start|>"
    TOOL_CALL_END_MARKER = "<|python_end|>"

    def __init__(self, tokenizer: PreTrainedTokenizerBase):
        super().__init__(tokenizer)
        # Ensure prev_tool_call_arr is initialized as list
        self.prev_tool_call_arr: list[dict] = []
        self.current_tool_index = 0
        self.streamed_args_for_tool: List[str] = []
        self.in_tool_call_stream = False
        self.stream_buffer = ""

    def _extract_content_between_markers(self, text: str) -> Optional[str]:
        """Extracts content between start and end markers."""

        start_idx = text.find(self.TOOL_CALL_START_MARKER)
        if start_idx == -1: return None
        start_idx += len(self.TOOL_CALL_START_MARKER)
        end_idx = text.find(self.TOOL_CALL_END_MARKER, start_idx)
        if end_idx == -1: return None
        return text[start_idx:end_idx]

    def _clean_and_parse(self, tool_str: str) -> List[ToolCall]:
        """
        Cleans trailing semicolon and attempts parsing using both strategies.
        Returns list of ToolCall objects on success, or an empty list on failure.
        """

        tool_str = tool_str.strip()
        if tool_str.endswith('];'):
            tool_str = tool_str[:-1]; logger.debug("Removed trailing semicolon.")

        parsed_tool_calls = []; parsed_ok = False; parsed_literal = None
        try: # Strategy 1: literal_eval
            logger.debug(f"Cleaned string for literal_eval: '{tool_str}'")
            parsed_literal = ast.literal_eval(tool_str)
            if isinstance(parsed_literal, list) and all(isinstance(item, dict) for item in parsed_literal):
                 temp_calls = []; all_items_valid = True
                 if not parsed_literal: parsed_ok = True; logger.debug("literal_eval parsed empty list [].")
                 else:
                      for item_dict in parsed_literal:
                           try:
                                if item_dict.get("type") == "function" and (item_dict.get("function") or item_dict.get("name")): temp_calls.append(_handle_dict_format(item_dict))
                                else: all_items_valid = False; break
                           except _UnexpectedFormatError: all_items_valid = False; break
                      if all_items_valid and temp_calls: parsed_tool_calls = temp_calls; parsed_ok = True; logger.debug("Parsed successfully using literal_eval (dict format).")
                      elif not all_items_valid: logger.debug("literal_eval parsed list of dicts, but content invalid.")
            else: logger.debug("literal_eval did not yield list of dicts.")
        except Exception as e: logger.debug(f"literal_eval failed: {e}. Trying ast.parse.")

        if not parsed_ok: # Strategy 2: ast.parse
            try:
                logger.debug(f"Cleaned string for ast.parse: '{tool_str}'")
                module = ast.parse(tool_str, mode='eval')
                parsed_expr = module.body
                if isinstance(parsed_expr, ast.Call):
                     logger.debug("Parsed as single AST Call node.")
                     try: parsed_tool_calls = [_handle_ast_call_node(parsed_expr)]; parsed_ok = True; logger.debug("Parsed successfully using ast.parse (single call format).")
                     except _UnexpectedFormatError as handler_e: logger.warning(f"Failed to handle single AST call node: {handler_e}")
                elif isinstance(parsed_expr, ast.List):
                     if not parsed_expr.elts: parsed_ok = True; logger.debug("ast.parse parsed empty list [].")
                     elif all(isinstance(e, ast.Call) for e in parsed_expr.elts):
                          logger.debug(f"Parsed as list of AST calls: {len(parsed_expr.elts)}")
                          temp_calls = []; all_calls_valid = True
                          for e in parsed_expr.elts:
                               try: temp_calls.append(_handle_ast_call_node(e))
                               except _UnexpectedFormatError as handler_e: all_calls_valid = False; logger.warning(f"Skipping invalid AST call node in list: {handler_e}"); break
                          if all_calls_valid and temp_calls: parsed_tool_calls = temp_calls; parsed_ok = True; logger.debug("Parsed successfully using ast.parse (list of calls format).")
                          elif not all_calls_valid: logger.debug("ast.parse parsed list of calls, but content invalid.")
                     else: logger.debug(f"ast.parse parsed List, but elements not all Calls: {[type(e) for e in parsed_expr.elts]}")
                else: logger.debug(f"ast.parse yielded unexpected type: {type(parsed_expr)}")
            except Exception as e: logger.debug(f"ast.parse failed: {e}")

        if parsed_ok: return parsed_tool_calls
        else:
             if not (isinstance(parsed_literal, list) and not parsed_literal): logger.warning(f"Failed to parse tool string using any strategy: '{tool_str[:100]}...'")
             return []


    def extract_tool_calls(
        self, model_output: str, request: ChatCompletionRequest
    ) -> ExtractedToolCallInformation:
        """
        Extract tool calls from a complete model response.
        Uses markers if present, handles semicolon, tries both parse strategies.
        """

        self.current_tool_index = 0; self.streamed_args_for_tool = []; self.in_tool_call_stream = False; self.prev_tool_call_arr = []; self.stream_buffer = ""
        tool_calls = []; content = model_output; tools_called = False
        try:
            tool_str_content = self._extract_content_between_markers(model_output)
            parse_target = None
            if tool_str_content is not None:
                logger.debug("Found content between markers.")
                if self.TOOL_CALL_END_MARKER in model_output: parse_target = tool_str_content; content = None
                else: logger.warning("Found start marker but end marker missing. Treating as text.")
            elif self.TOOL_CALL_START_MARKER not in model_output:
                 logger.debug("No markers found, attempting to parse entire output.")
                 parse_target = model_output.strip()
            if parse_target is not None:
                 try:
                      tool_calls = self._clean_and_parse(parse_target) # Returns [] on failure
                      if isinstance(tool_calls, list) and tool_calls: tools_called = True; content = None; logger.info(f"Successfully extracted {len(tool_calls)} tool calls.")
                      elif isinstance(tool_calls, list) and not tool_calls and parse_target.strip() == '[]': tools_called = False; content = None; logger.info("Parsed empty tool call list [].")
                      elif isinstance(tool_calls, list) and not tool_calls and parse_target.strip():
                           logger.warning(f"Tool call parsing failed for target: '{parse_target[:100]}...'. Treating as text content.");
                           if parse_target == model_output.strip(): content = model_output
                           else: content = None
                           tools_called = False; tool_calls = []
                      else: tools_called = False; tool_calls = []; content = model_output
                 except Exception as e: logger.exception("Unexpected error during _clean_and_parse call.");
                 if parse_target == model_output.strip(): content = model_output
                 tools_called = False; tool_calls = []
        except Exception as e: logger.exception("Unexpected error during tool call extraction wrapper."); content = model_output; tools_called = False; tool_calls = []
        if not isinstance(tool_calls, list): logger.error(f"Parser yielded non-list tool_calls: {tool_calls}. Resetting."); tool_calls = []; tools_called = False; content = model_output
        return ExtractedToolCallInformation(tools_called=tools_called, tool_calls=tool_calls, content=content)


    def extract_tool_calls_streaming(
        self,
        previous_text: str,
        current_text: str,
        delta_text: str,
        previous_token_ids: Sequence[int],
        current_token_ids: Sequence[int],
        delta_token_ids: Sequence[int],
        request: ChatCompletionRequest,
    ) -> Union[DeltaMessage, None]:
        """
        Extracts tool calls during streaming, using markers and simplified parse-on-end logic.
        Handles single call or list of calls. Returns None on intermediate errors or parse failure.
        """
        if not previous_text:
             logger.debug("Resetting streaming state for new request.")
             self.current_tool_index = 0; self.streamed_args_for_tool = []; self.in_tool_call_stream = False; self.prev_tool_call_arr = []; self.stream_buffer = ""

        self.stream_buffer = current_text

        # --- State Check: Are we in a tool call stream? ---
        if not self.in_tool_call_stream:

            stripped_buffer = self.stream_buffer.strip()
            if stripped_buffer.startswith(self.TOOL_CALL_START_MARKER):
                 logger.debug("Detected start marker. Entering tool call stream.")
                 self.in_tool_call_stream = True; return None
            elif stripped_buffer.startswith('['):
                 start_bracket_idx = self.stream_buffer.find('[')
                 text_before = self.stream_buffer[:start_bracket_idx].strip()
                 if len(text_before) < 5: logger.debug("Detected '[' start. Entering tool call stream (fallback)."); self.in_tool_call_stream = True; return None
                 else: self.in_tool_call_stream = False
            else: self.in_tool_call_stream = False

        # --- If not in tool stream, yield content ---
        if not self.in_tool_call_stream:
             assert isinstance(self.prev_tool_call_arr, list)
             logger.debug("Not in tool stream, yielding content delta.")
             return DeltaMessage(content=delta_text)

        # --- In tool stream: Buffer and wait for end marker ---
        logger.debug("In tool stream, buffering.")
        if self.TOOL_CALL_END_MARKER in self.stream_buffer:
             logger.debug(f"End marker '{self.TOOL_CALL_END_MARKER}' detected. Attempting final parse.")
             tool_str_content = self._extract_content_between_markers(self.stream_buffer)

             self.in_tool_call_stream = False # Attempt parse now, stream state ends

             parsed_tool_calls = []
             parse_failed = False
             if tool_str_content is not None:
                  try:
                       parsed_tool_calls = self._clean_and_parse(tool_str_content) # Returns [] on failure
                       if not parsed_tool_calls and tool_str_content.strip() and tool_str_content.strip() != '[]':
                            logger.error(f"Streaming: Final parse failed after end marker for content: '{tool_str_content[:100]}...'")
                            parse_failed = True
                       else: logger.debug(f"Final parse yielded {len(parsed_tool_calls)} calls.")
                  except Exception as e: logger.exception("Streaming: Unexpected error during final parse."); parse_failed = True
             else: logger.error("End marker detected, but failed to extract content."); parse_failed = True

             # --- Return None on failure, otherwise calculate deltas ---
             if parse_failed: return None

             # --- Generate Deltas based on successful parse ---
             tool_deltas = []
             for index, final_call in enumerate(parsed_tool_calls):
                  while len(self.streamed_args_for_tool) <= index: self.streamed_args_for_tool.append("")
                  delta = _compute_tool_delta(self.streamed_args_for_tool[index], final_call, index)
                  if delta:
                       tool_deltas.append(delta)
                       if delta.function and delta.function.arguments is not None:
                            if delta.id: self.streamed_args_for_tool[index] = delta.function.arguments
                            else: self.streamed_args_for_tool[index] += delta.function.arguments
             self.current_tool_index = len(parsed_tool_calls)

             # --- HACK from Llama 3.2 not sure if necessary ---
             # This ensures finish_reason is tool_calls if we successfully parsed
             # and generated any tool call deltas.
             if tool_deltas and not self.prev_tool_call_arr:
                 logger.debug("Setting prev_tool_call_arr hack for finish_reason.")
                 self.prev_tool_call_arr = [{"arguments": {}}] # Minimal structure


             assert isinstance(self.prev_tool_call_arr, list)

             if tool_deltas:
                  logger.info(f"Streaming: Sending final {len(tool_deltas)} tool call deltas.")
                  return DeltaMessage(tool_calls=tool_deltas)
             else:
                  logger.debug("Streaming: Final parse successful but no new deltas.")
                  # Parsed OK but no changes needed, return None to let finish_reason end it
                  return None
        else:
             # In tool stream, but end marker not yet seen
             return None # Continue buffering

Chat template:

{{- '<|begin_of_text|>' -}}
{%- if custom_tools is defined %}
    {%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
    {%- set tools_in_user_message = false %}
{%- endif %}
{%- if not date_string is defined %}
    {%- if strftime_now is defined %}
        {%- set date_string = strftime_now("%d %b %Y") %}
    {%- else %}
        {# Default date if function unavailable #}
        {%- set date_string = "10 Apr 2025" %}
    {%- endif %}
{%- endif %}
{%- if not tools is defined %}
    {%- set tools = none %}
{%- endif %}

{#- Extract system message if present, otherwise use default #}
{%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content']|trim %}
    {%- set messages = messages[1:] %}
{%- else %}
    {%- set system_message = "You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question." %}
{%- endif %}

{#- System Prompt Section #}
{{- '<|header_start|>system<|header_end|>\n\n' -}}
{%- if tools is not none %}
    {# Include Environment: ipython if necessary for Llama 4 context #}
    {{- 'Environment: ipython\n' -}}
{%- endif %}
{{- 'Cutting Knowledge Date: August 2024\n' -}} {# Update based on Llama 4 model card #}
{{- 'Today Date: ' + date_string + '\n\n' -}}
{%- if tools is not none and not tools_in_user_message %}
    {# Tool definitions and Pythonic format instruction in System Prompt #}
    {{- 'You have access to the following functions. To call functions, please respond *only* with a Python list of the calls. ' -}}
    {{- 'Respond in the format [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] ' -}}
    {{- 'Do not use variables. If no function call is needed, respond naturally.\n\n' -}}
    {%- for t in tools %}
        {# Embed tool schema (JSON format) #}
        {{- t | tojson(indent=4) -}}
        {{- '\n\n' -}}
    {%- endfor %}
{%- endif %}
{{- system_message -}}
{{- '<|eot|>' -}} {# End of Turn for system message #}

{#- Handle custom tools passed in the first user message #}
{%- if tools_in_user_message and not tools is none %}
    {%- if messages | length != 0 %}
        {%- set first_user_message = messages[0]['content']|trim %}
        {%- set messages = messages[1:] %}
    {%- else %}
        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
    {%- endif %}
    {{- '<|header_start|>user<|header_end|>\n\n' -}}
    {# Tool definitions and Pythonic format instruction in User Prompt #}
    {{- "Given the following functions, please respond with a python list for function calls " -}}
    {{- "with their proper arguments to best answer the given prompt.\n\n" -}}
    {{- 'Respond *only* in the format [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)] ' -}}
    {{- "Do not use variables. If no function call is needed, respond naturally.\n\n" -}}
    {%- for t in tools %}
        {# Embed tool schema (JSON format) #}
        {{- t | tojson(indent=4) -}}
        {{- '\n\n' -}}
    {%- endfor %}
    {{- first_user_message -}}
    {{- '<|eot|>' -}} {# End of Turn for the first user message with tools #}
{%- endif %}

{#- Process remaining conversation history #}
{%- for message in messages %}
    {# --- Check for image data associated with this message --- #}
    {# --- This assumes image data might be in message.image_data --- #}
    {%- if message.image_data is defined and message.image_data %}
        {{- '<|image_start|><|image|>' -}}
        {# --- Logic to include actual image tokens/data would go here --- #}
    {%- endif %}

    {%- if message.role == 'user' %}
        {{- '<|header_start|>user<|header_end|>\n\n' + message['content'] | trim -}}
        {# Add <|image_end|> only if image was started for this message #}
        {%- if message.image_data is defined and message.image_data %}{{- '<|image_end|>'}}{%- endif -%}
        {{- '<|eot|>' -}}
    {%- elif message.role == 'assistant' %}
        {{- '<|header_start|>assistant<|header_end|>\n\n' -}}
        {%- if 'tool_calls' in message and message.tool_calls | length > 0 %}
            {# Assistant response is previous tool calls (Pythonic format) #}
            {# Check for potential start/end markers like <|python_tag|> if Llama 4 uses them #}
            {# {%- set needs_python_tag = true %} #} {# Example flag based on future findings #}
            {# {%- if needs_python_tag %}<|python_tag|> {%- endif %} #}
            {{- '[' -}}
            {%- for tool_call in message.tool_calls %}
                {%- set func = tool_call.function %}
                {{- func.name + '(' -}}
                {%- set arguments = func.arguments %} {# Assume func.arguments is already a dictionary #}
                {%- for name, value in arguments.items() %}
                    {{- name + '=' -}}
                    {# Ensure proper string representation for different types #}
                    {%- if value is string -%}
                        {{- '"%s"' | format(value | replace('"', '\\"')) -}}
                    {%- elif value is boolean -%}
                        {{- 'True' if value else 'False' -}}
                    {%- elif value is none -%}
                        {{- 'None' -}}
                    {%- else -%}
                        {{- value -}}
                    {%- endif -%}
                    {% if not loop.last %}, {% endif %}
                {%- endfor %}
                {{- ')' -}}
                {% if not loop.last %}, {% endif %}
            {%- endfor %}
            {{- ']' -}}
            {# {%- if needs_python_tag %} <|eot|> {%- endif %} #} {# End tag if needed #}
        {%- else %}
            {# Assistant response is plain text #}
            {{- message['content'] | trim -}}
        {%- endif %}
        {# Add <|image_end|> only if image was started for this message #}
        {%- if message.image_data is defined and message.image_data %}{{- '<|image_end|>'}}{%- endif -%}
        {{- '<|eot|>' -}} {# End of Turn for assistant #}
    {%- elif message.role == 'tool' or message.role == 'ipython' %}
        {# Tool execution result, using 'ipython' role header #}
        {{- '<|header_start|>ipython<|header_end|>\n\n' -}}
        {%- if message.content is mapping %}
            {{- message.content | tojson -}}
        {%- else %}
            {# Ensure content is JSON formatted, wrap raw string if needed #}
            {%- set tool_output = message.content | trim -%}
            {%- if tool_output.startswith('{') and tool_output.endswith('}') %}
                 {{- tool_output -}}
            {%- else -%}
                 {{- {"output": tool_output} | tojson -}}
            {%- endif -%}
        {%- endif %}
        {# Add <|image_end|> only if image was started for this message #}
        {%- if message.image_data is defined and message.image_data %}{{- '<|image_end|>'}}{%- endif -%}
        {{- '<|eom|>' -}} {# End of Message for tool result #}
    {%- endif %}
 {%- endfor %}

{#- Add generation prompt for the model to start its response #}
{%- if add_generation_prompt %}
    {{- '<|header_start|>assistant<|header_end|>\n\n' -}}
{%- endif %}

@yeqcharlotte yeqcharlotte moved this from Todo to In Progress in Llama-4 Features & Optimizations Apr 11, 2025
@yeqcharlotte
Copy link
Collaborator

yeqcharlotte commented Apr 11, 2025

Sending #16463 that folks can use to iterate on.

@yeqcharlotte yeqcharlotte moved this from In Progress to Done in Llama-4 Features & Optimizations Apr 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
Development

No branches or pull requests

5 participants