Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update ground truth for multi turn base #956

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Daniel-Mash
Copy link

This PR includes fixes to the ground truth file in the Gorilla repository. While implementing Gorilla Multi-Turn BFCL-v3 to introduce multi-turn interactions in our platform xpander.ai for performance evaluation, I encountered several issues in the ground truth data. However, additional issues were found and will be submitted separately. This PR addresses those issues to improve accuracy and reliability.

@HuanzhiMao HuanzhiMao added the BFCL-Dataset BFCL Dataset-Related Issue label Mar 26, 2025
Copy link
Collaborator

@HuanzhiMao HuanzhiMao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @cedricvidal and welcome! Good catch on those typos!

I submitted a PR to your branch with my proposed changes for three of the entries. Daniel-Mash#1. Let me know what you think!

@@ -2,7 +2,7 @@
{"id": "multi_turn_base_1", "ground_truth": [["ls(a=True)"], ["cd(folder='workspace')", "mv(source='log.txt',destination='archive')"], ["cd(folder='archive')", "grep(file_name='log.txt',pattern='Error')"], ["tail(file_name='log.txt',lines=20)"]]}
{"id": "multi_turn_base_2", "ground_truth": [["cd(folder='documents')", "touch(file_name='TeamNotes.txt')"], ["echo(content='Collaboration leads to success. Innovation ignites growth.',file_name='TeamNotes.txt')"], ["diff(file_name1='ideas.txt', file_name2='TeamNotes.txt')"], ["cp(source='TeamNotes.txt',destination='Archived')", "cd(folder='Archived')", "mv(source='TeamNotes.txt',destination='IdeasArchive.txt')"], ["cat(file_name='IdeasArchive.txt')"]]}
{"id": "multi_turn_base_3", "ground_truth": [["find(path='.',name='test')"], ["cd(folder='projects')","cd(folder='photography')","cp(source='test_image1.jpg',destination='backup_tests')", "cp(source='test_document.txt',destination='backup_tests')"]]}
{"id": "multi_turn_base_4", "ground_truth": [["ls(a=True)"], ["sort(file_name='report.txt')"], ["post_tweet(content='Initial report content More unsorted data Unsorted data', mentions=['@Julia'], tags=['#currenttechtrend'])"]]}
{"id": "multi_turn_base_4", "ground_truth": [["ls(a=True)"], ["sort(file_name='report.txt')"], ["post_tweet(content='Initial report content\\nMore unsorted data\\nUnsorted data', mentions=['@Julia'], tags=['#currenttechtrend'])"]]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execution result of the previous function call sort(file_name='report.txt') is "{\"sorted_content\": \"Initial report content Unsorted data More unsorted data\"}". So here there should be no \\n in the tweet content.
This entry is not ideal because it's a bit misleading even for humans. The capital letters in that string will make people think that it is three lines concated together. A way to fix that is by updating the initial setting of this entry so that the file contains three lines of text, and then sorting can actually change their order.

@@ -154,7 +154,7 @@
{"id": "multi_turn_base_153", "ground_truth": [["verify_traveler_information(first_name='Michael', last_name='Thompson', date_of_birth='1950-01-01', passport_number='P12345678')"], ["get_flight_cost(travel_from='RMS', travel_to='GFD', travel_date='2024-12-15', travel_class='business')"], ["set_budget_limit(access_token='abc123xyz', budget_limit=5000)"], ["book_flight(access_token='abc123xyz', card_id='card1234', travel_date='2024-12-15', travel_from='RMS', travel_to='GFD', travel_class='business', travel_cost=640.0)"], ["retrieve_invoice(access_token='abc123xyz', booking_id='3426812')"]]}
{"id": "multi_turn_base_154", "ground_truth": [["verify_traveler_information(first_name='Michael', last_name='Smith', date_of_birth='1962-02-14', passport_number='P87654321')"], ["get_flight_cost(travel_from='ORD', travel_to='LAX', travel_date='2024-08-10', travel_class='economy')"], ["set_budget_limit(access_token='token_ABC123XYZ', budget_limit=1500)"], ["book_flight(access_token='token_ABC123XYZ', card_id='card1', travel_date='2024-08-10', travel_from='ORD', travel_to='LAX', travel_class='economy', travel_cost=180.0)"], ["view_messages_sent()"]]}
{"id": "multi_turn_base_155", "ground_truth": [["get_flight_cost(travel_from='LAX', travel_to='JFK', travel_date='2024-11-15', travel_class='business')", "book_flight(access_token='abc123xyz', card_id='id15583', travel_date='2024-11-15', travel_from='LAX', travel_to='JFK', travel_class='business', travel_cost=2400.0)"], ["purchase_insurance(access_token='abc123xyz', insurance_type='comprehensive', insurance_cost=120.0, booking_id='3426812', card_id='id15583')"]]}
{"id": "multi_turn_base_156", "ground_truth": [["get_flight_cost(travel_from='SFO', travel_to='LAX', travel_date='2024-10-15', travel_class='first')", "book_flight(access_token='abc123xyz456', card_id='travel_card_12345', travel_date='2024-10-15', travel_from='SFO', travel_to='LAX', travel_class='first', travel_cost=1000.0)"], ["retrieve_invoice(access_token='abc123xyz456', booking_id='3426812')"], ["contact_customer_support(booking_id='3426812', message='Urgent: last-minute complication with my reservation.')"], ["ticket_login(username='mthompson', password='securePass123')", "create_ticket(title='Urgent Flight Issue', priority=4)"], ["edit_ticket(ticket_id=0, updates={'status':'Urgent', 'priority':5})"]]}
{"id": "multi_turn_base_156", "ground_truth": [["get_flight_cost(travel_from='SFO', travel_to='LAX', travel_date='2024-10-15', travel_class='first')", "book_flight(access_token='abc123xyz456', card_id='travel_card_12345', travel_date='2024-10-15', travel_from='SFO', travel_to='LAX', travel_class='first', travel_cost=1000.0)"], ["retrieve_invoice(access_token='abc123xyz456', booking_id='3426812')"], ["contact_customer_support(booking_id='3426812', message='Urgent: last-minute complication with my reservation.')"], ["ticket_login(username='mthompson', password='securePass123')", "create_ticket(title='Urgent Flight Issue', priority=4)"], ["edit_ticket(ticket_id=0, updates={'status':'Urgent'})"]]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user query says Could you enhance the ticket's details by updating its status to 'Urgent' to reflect the highest priority?, so we need to have the updates argument as {'status':'Urgent', 'priority':5}. Otherwise the priority will be kept at 4.
But I could see how someone could interpret this as, change only the status to urgent, since an urgent status might also mean the highest priority. I think updating the query wording is probably the best way to eliminate such ambiguity.

@@ -178,7 +178,7 @@
{"id": "multi_turn_base_177", "ground_truth": [["book_flight(access_token='abc123xyz', card_id='card_7629', travel_date='2024-12-15', travel_from='LAX', travel_to='HND', travel_class='business', travel_cost=1200.0)"], ["retrieve_invoice(access_token='abc123xyz', booking_id='3426812')"]]}
{"id": "multi_turn_base_178", "ground_truth": [["register_credit_card(access_token='abc123xyz', card_number='4012888888881881', expiration_date='12/2028', card_verification_number=465, cardholder_name='Michael Smith')"], ["purchase_insurance(access_token='abc123xyz', insurance_type='comprehensive', booking_id='flight_001', insurance_cost=200.0, card_id='262919693687')", "retrieve_invoice(access_token='abc123xyz', booking_id='flight_001')"], ["contact_customer_support(booking_id='flight_001', message='Insurance-related issues that need resolution')"], ["ticket_login(username='msmith', password='SecurePass123')", "create_ticket(priority=4, title='Unsatisfactory Customer Support', description='')"], ["cancel_booking(access_token='abc123xyz', booking_id='flight_001')"]]}
{"id": "multi_turn_base_179", "ground_truth": [["book_flight(access_token='abc123xyz', card_id='card_6789', travel_date='2024-11-12', travel_from='CRH', travel_to='JFK', travel_class='economy', travel_cost=850.0)"], ["purchase_insurance(access_token='abc123xyz', insurance_type='comprehensive', booking_id='3426812', insurance_cost=100.0, card_id='card_6789')"], ["retrieve_invoice(access_token='abc123xyz', booking_id='3426812')"], ["contact_customer_support(booking_id='3426812', message='Request for itinerary adjustments or refund due to family emergency.')"]]}
{"id": "multi_turn_base_180", "ground_truth": [["compute_exchange_rate(base_currency='RMB', target_currency='USD', value=20000.0)", "set_budget_limit(access_token='abc123xyz', budget_limit=2857.14)"], ["get_flight_cost(travel_from='LAX', travel_to='JFK', travel_date='2024-10-12', travel_class='business')", "book_flight(access_token='abc123xyz', card_id='main_card', travel_date='2024-10-12', travel_from='LAX', travel_to='JFK', travel_class='business', travel_cost=2400.0)"], ["cancel_booking(access_token='abc123xyz', booking_id='3426812')"], [], [], ["ticket_login(username='mzhang', password='SecurePass123')", "create_ticket(priority=3, title='Invoice Discrepancy', description='Problem encountered with invoicing.')"]]}
{"id": "multi_turn_base_180", "ground_truth": [["set_budget_limit(access_token='abc123xyz', budget_limit=2857.14)"], ["get_flight_cost(travel_from='LAX', travel_to='JFK', travel_date='2024-10-12', travel_class='business')", "book_flight(access_token='abc123xyz', card_id='main_card', travel_date='2024-10-12', travel_from='LAX', travel_to='JFK', travel_class='business', travel_cost=2400.0)"], ["cancel_booking(access_token='abc123xyz', booking_id='3426812')"], [], [], ["ticket_login(username='mzhang', password='SecurePass123')", "create_ticket(priority=3, title='Invoice Discrepancy', description='Problem encountered with invoicing.')"]]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since compute_exchange_rate is a read operation, it won’t change the state even if we skip it. However, we still require the function call as part of the path to produce the correct final answer. If a function for currency conversion is available, the model should always rely on that rather than using any of its pre-trained exchange rates knowledge, since those may be outdated and currency values can change rapidly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-Dataset BFCL Dataset-Related Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants