Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ThinkAgents/ThinkAgent-1B #928

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Conversation

0xayman
Copy link

@0xayman 0xayman commented Mar 3, 2025

Add ThinkAgents/ThinkAgent-1B model and model handler

Copy link
Collaborator

@HuanzhiMao HuanzhiMao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @0xayman !
Is there a reason why the _format_prompt method is using a different chat template logic than the one linked in huggingface (link)?

@0xayman
Copy link
Author

0xayman commented Mar 13, 2025

When i used the default one, the generations were messy, so I changed it to be the same formatting function i used for finetuning the model.

@HuanzhiMao
Copy link
Collaborator

When i used the default one, the generations were messy, so I changed it to be the same formatting function i used for finetuning the model.

In that case, if you’re using a custom chat template that provides better generation results, please document it in the model card. This way, users will know exactly how to replicate your function-calling setup, and we’ll benchmark the model using your recommended approach so the score accurately reflects the typical user experience.

@0xayman
Copy link
Author

0xayman commented Mar 17, 2025

I have updated the model tokenizer to use the correct chat template used in the _format_prompt function. please review it and let me know if further updates are needed.

@HuanzhiMao
Copy link
Collaborator

I think the chat template and _format_prompt function is still misaligned.
For example, your chat template has the following section:

{%- for message in messages %}
            {%- if not (message.role == 'tool' or 'tool_calls' in message) %}
                {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>' + message['content'] | trim + '<|eot_id|>' }}
            ...
            {%- endif %}
        {%- endfor %}

but in your `_format_prompt), you only have this:

        for message in messages:
            formatted_prompt += f"{message['content']}<|eot_id|>\n"

Notice how the <|start_header_id|> and <|end_header_id|> thing are missing.

@HuanzhiMao
Copy link
Collaborator

@0xayman Would you be fine if I directly made modifications to your branch? I can also raise a PR to your branch instead. Either way is fine with me.

@0xayman
Copy link
Author

0xayman commented Mar 18, 2025

Yes it's fine, you can go ahead with and make and required modifications.

@HuanzhiMao
Copy link
Collaborator

Are you doing the tools_in_user_message approach or no (according to your chat template)?

@0xayman
Copy link
Author

0xayman commented Mar 19, 2025

Yes I'm passing the tools in the user message instead of the system prompt. I found this to work better

@HuanzhiMao
Copy link
Collaborator

I got the following for data_overall.csv. Does this align with what you obtained?

Rank,Overall Acc,Model,Model Link,Cost ($ Per 1k Function Calls),Latency Mean (s),Latency Standard Deviation (s),Latency 95th Percentile (s),Non-Live AST Acc,Non-Live Simple AST,Non-Live Multiple AST,Non-Live Parallel AST,Non-Live Parallel Multiple AST,Non-Live Exec Acc,Non-Live Simple Exec,Non-Live Multiple Exec,Non-Live Parallel Exec,Non-Live Parallel Multiple Exec,Live Acc,Live Simple AST,Live Multiple AST,Live Parallel AST,Live Parallel Multiple AST,Multi Turn Acc,Multi Turn Base,Multi Turn Miss Func,Multi Turn Miss Param,Multi Turn Long Context,Relevance Detection,Irrelevance Detection,Organization,License
1,18.03%,ThinkAgent-1B,https://huggingface.co/ThinkAgents/ThinkAgent-1B,N/A,11.12,18.31,61.04,56.38%,49.00%,73.00%,59.00%,44.50%,0.00%,0.00%,0.00%,0.00%,0.00%,27.05%,42.64%,28.11%,31.25%,16.67%,0.00%,0.00%,0.00%,0.00%,0.00%,61.11%,19.33%,ThinkAgents,apache-2.0

@0xayman
Copy link
Author

0xayman commented Mar 19, 2025

Can you please share the csv file so I can check it ?

@HuanzhiMao
Copy link
Collaborator

@0xayman
Copy link
Author

0xayman commented Mar 19, 2025

It is close to what I get for parallel_mutliple and multiple ast tests. But very far in simple and parallel ast tests.
These are the only 4 measures we're interested on.

@0xayman
Copy link
Author

0xayman commented Mar 19, 2025

My latest evaluation records:
Simple: 77.25
Parallel Multiple: 45
Paralllel: 65.5
Multiple: 71.5

@0xayman
Copy link
Author

0xayman commented Mar 20, 2025

I've updated the handler and attached the latest evaluation results.
score.zip

@HuanzhiMao
Copy link
Collaborator

I generated another run, and attached the fully formatted prompt before it hit the completion endpoint for test case id simple_399. Is that expected? There seem to be conflicting formatting instructions in the system and user prompts, plus duplicated function documentation in both. That’s why I didn’t include the default system prompt earlier.

If everything looks good to you, I’ll go ahead and merge the PR and update the leaderboard with your model’s score!

<|begin_of_text|><|start_header_id|>system<|end_header_id|>Cutting Knowledge Date: December 2023Today Date: 07 Dec 2024You are an expert in composing functions. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose.\nIf none of the functions can be used, point it out. If the given question lacks the parameters required by the function, also point it out.\nYou should only return the function calls in your response.\n\nIf you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]\nYou SHOULD NOT include any other text in the response.\n\nAt each turn, you should try your best to complete the tasks requested by the user within the current turn. Continue to output functions to call until you have fulfilled the user's request to the best of your ability. Once you have no more functions to call, the system will consider the current turn complete and proceed to the next turn or task.\n\nHere is a list of functions in JSON format that you can invoke.\n[{'name': 'restaurant_search', 'description': 'Locates top rated restaurants based on specific criteria such as type of cuisine, ratings, and facilities. Note that the provided function is in Python 3 syntax.', 'parameters': {'type': 'dict', 'properties': {'location': {'type': 'string', 'description': 'The city and state, e.g. New York City, NY'}, 'cuisine': {'type': 'string', 'description': 'Preferred type of cuisine e.g., Italian, Indian, American, etc.'}, 'rating': {'type': 'integer', 'description': 'Minimum average customer rating out of 5'}, 'accepts_credit_cards': {'type': 'boolean', 'description': 'If the restaurant should accept credit cards.'}}, 'required': ['location', 'cuisine', 'rating', 'accepts_credit_cards']}}]<|eot_id|><|start_header_id|>user<|end_header_id|>Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.{\n "name": "restaurant_search",\n "description": "Locates top rated restaurants based on specific criteria such as type of cuisine, ratings, and facilities. Note that the provided function is in Python 3 syntax.",\n "parameters": {\n "location": {\n "type": "string",\n "description": "The city and state, e.g. New York City, NY"\n },\n "cuisine": {\n "type": "string",\n "description": "Preferred type of cuisine e.g., Italian, Indian, American, etc."\n },\n "rating": {\n "type": "integer",\n "description": "Minimum average customer rating out of 5"\n },\n "accepts_credit_cards": {\n "type": "boolean",\n "description": "If the restaurant should accept credit cards."\n }\n }\n}Find me the best Italian restaurants in New York City with average customer ratings of more than 4 and accepts credit cards.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

@0xayman
Copy link
Author

0xayman commented Mar 22, 2025

Yes the default system prompt should not be included. Here is the formatting function I was usnig initially:

def _format_prompt(self, messages, function):
        # We first format the function signature and then add the messages
        tools = self._convert_functions_format(function)

        formatted_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 07 Dec 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {{"name": function name, "parameters": dictionary of argument name and its value}}.Do not use variables.

{tools}

"""
        
        for message in messages:
            formatted_prompt += f"{message['content']}<|eot_id|>\n"

        formatted_prompt += "<|start_header_id|>assistant<|end_header_id|>\n"
        return formatted_prompt

Can you please tell me if there are any particular reasons we can't use it?

@HuanzhiMao
Copy link
Collaborator

Yes the default system prompt should not be included. Here is the formatting function I was usnig initially:

def _format_prompt(self, messages, function):
        # We first format the function signature and then add the messages
        tools = self._convert_functions_format(function)

        formatted_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 07 Dec 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {{"name": function name, "parameters": dictionary of argument name and its value}}.Do not use variables.

{tools}

"""
        
        for message in messages:
            formatted_prompt += f"{message['content']}<|eot_id|>\n"

        formatted_prompt += "<|start_header_id|>assistant<|end_header_id|>\n"
        return formatted_prompt

Can you please tell me if there are any particular reasons we can't use it?

As explained here, the issue with your formatting function is that, it is not aligned with what the chat template from the model card on huggingface is suggesting.

@0xayman
Copy link
Author

0xayman commented Mar 22, 2025

I think I misunderstood the last message. but,, where is this part of the prompt is coming from:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>Cutting Knowledge Date: December 2023Today Date: 07 Dec 2024You are an expert in composing functions. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose.\nIf none of the functions can be used, point it out. If the given question lacks the parameters required by the function, also point it out.\nYou should only return the function calls in your response.\n\nIf you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]\nYou SHOULD NOT include any other text in the response.\n\nAt each turn, you should try your best to complete the tasks requested by the user within the current turn. Continue to output functions to call until you have fulfilled the user's request to the best of your ability. Once you have no more functions to call, the system will consider the current turn complete and proceed to the next turn or task.\n\nHere is a list of functions in JSON format that you can invoke.\n[{'name': 'restaurant_search', 'description': 'Locates top rated restaurants based on specific criteria such as type of cuisine, ratings, and facilities. Note that the provided function is in Python 3 syntax.', 'parameters': {'type': 'dict', 'properties': {'location': {'type': 'string', 'description': 'The city and state, e.g. New York City, NY'}, 'cuisine': {'type': 'string', 'description': 'Preferred type of cuisine e.g., Italian, Indian, American, etc.'}, 'rating': {'type': 'integer', 'description': 'Minimum average customer rating out of 5'}, 'accepts_credit_cards': {'type': 'boolean', 'description': 'If the restaurant should accept credit cards.'}}, 'required': ['location', 'cuisine', 'rating', 'accepts_credit_cards']}}]<|eot_id|>

If I'm not mistaken, the formatting function in the current version of the code uses the following logic:

formatted_prompt = "<|begin_of_text|>"

        system_message = ""
        remaining_messages = messages
        if messages[0]["role"] == "system":
            system_message = messages[0]["content"].strip()
            remaining_messages = messages[1:]

        formatted_prompt += "<|start_header_id|>system<|end_header_id|>"
        formatted_prompt += "Cutting Knowledge Date: December 2023"
        formatted_prompt += "Today Date: 07 Dec 2024"
        formatted_prompt += system_message + "<|eot_id|>"

It cutsoff the default system prompt then append my custom system prompt.

@HuanzhiMao
Copy link
Collaborator

The default system prompt is not cut off. It's still included through these two lines.

system_message = messages[0]["content"].strip()
formatted_prompt += system_message + "<|eot_id|>"

@0xayman
Copy link
Author

0xayman commented Mar 22, 2025

Please review the latest commit I've made.
I removed the redundent system prompt. now the model should be fine.

@0xayman
Copy link
Author

0xayman commented Mar 27, 2025

@HuanzhiMao Just a reminder to check if everthing is going good.

@HuanzhiMao
Copy link
Collaborator

HuanzhiMao commented Mar 28, 2025

Regarding your last commit, 7f1f62f, it makes sense to not include the default system prompt here. However, these changes don't make sense. They have nothing to do with system prompt, and you are not following your own chat template.

For example, for this part of the chat template on function doc format, it does not translate to just formatted_prompt +=f"{tools}\n":

{%- for t in tools %}
    {{- {"name": t.name, "description": t.description, "parameters": t.parameters.properties} | tojson(indent=4) }}
    {{- "" }}
{%- endfor %}

@0xayman
Copy link
Author

0xayman commented Mar 28, 2025

I've fixed the chat template and made a new commit.

@0xayman
Copy link
Author

0xayman commented Apr 2, 2025

@HuanzhiMao Any updates so far?

@HuanzhiMao
Copy link
Collaborator

Regarding your last commit, 7f1f62f, it makes sense to not include the default system prompt here. However, these changes don't make sense. They have nothing to do with system prompt, and you are not following your own chat template.

For example, for this part of the chat template on function doc format, it does not translate to just formatted_prompt +=f"{tools}\n":

{%- for t in tools %}
    {{- {"name": t.name, "description": t.description, "parameters": t.parameters.properties} | tojson(indent=4) }}
    {{- "" }}
{%- endfor %}

I believe you haven't addressed my above concern in your new commit.

@0xayman
Copy link
Author

0xayman commented Apr 5, 2025

The part you've mentioned is no longer part of the chat_template, it has been replaced with this code In the last commit. Also the model's chat_template has been updated in huggingface.

@HuanzhiMao
Copy link
Collaborator

@0xayman
Copy link
Author

0xayman commented Apr 9, 2025

Are you sure you are viewing the latest commit ? https://github.com/0xayman/gorilla/tree/382b4957f60a3245c37a5446a2a96cb758e645f6

@HuanzhiMao
Copy link
Collaborator

@0xayman
Copy link
Author

0xayman commented Apr 10, 2025

I'm not sure if it will affect the results, but I've removed them to be consistent.
Please let me know if further updates are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants