-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update ground truth for multi turn base #956
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @cedricvidal and welcome! Good catch on those typos!
I submitted a PR to your branch with my proposed changes for three of the entries. Daniel-Mash#1. Let me know what you think!
@@ -2,7 +2,7 @@ | |||
{"id": "multi_turn_base_1", "ground_truth": [["ls(a=True)"], ["cd(folder='workspace')", "mv(source='log.txt',destination='archive')"], ["cd(folder='archive')", "grep(file_name='log.txt',pattern='Error')"], ["tail(file_name='log.txt',lines=20)"]]} | |||
{"id": "multi_turn_base_2", "ground_truth": [["cd(folder='documents')", "touch(file_name='TeamNotes.txt')"], ["echo(content='Collaboration leads to success. Innovation ignites growth.',file_name='TeamNotes.txt')"], ["diff(file_name1='ideas.txt', file_name2='TeamNotes.txt')"], ["cp(source='TeamNotes.txt',destination='Archived')", "cd(folder='Archived')", "mv(source='TeamNotes.txt',destination='IdeasArchive.txt')"], ["cat(file_name='IdeasArchive.txt')"]]} | |||
{"id": "multi_turn_base_3", "ground_truth": [["find(path='.',name='test')"], ["cd(folder='projects')","cd(folder='photography')","cp(source='test_image1.jpg',destination='backup_tests')", "cp(source='test_document.txt',destination='backup_tests')"]]} | |||
{"id": "multi_turn_base_4", "ground_truth": [["ls(a=True)"], ["sort(file_name='report.txt')"], ["post_tweet(content='Initial report content More unsorted data Unsorted data', mentions=['@Julia'], tags=['#currenttechtrend'])"]]} | |||
{"id": "multi_turn_base_4", "ground_truth": [["ls(a=True)"], ["sort(file_name='report.txt')"], ["post_tweet(content='Initial report content\\nMore unsorted data\\nUnsorted data', mentions=['@Julia'], tags=['#currenttechtrend'])"]]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execution result of the previous function call sort(file_name='report.txt')
is "{\"sorted_content\": \"Initial report content Unsorted data More unsorted data\"}"
. So here there should be no \\n
in the tweet content.
This entry is not ideal because it's a bit misleading even for humans. The capital letters in that string will make people think that it is three lines concated together. A way to fix that is by updating the initial setting of this entry so that the file contains three lines of text, and then sorting can actually change their order.
@@ -154,7 +154,7 @@ | |||
{"id": "multi_turn_base_153", "ground_truth": [["verify_traveler_information(first_name='Michael', last_name='Thompson', date_of_birth='1950-01-01', passport_number='P12345678')"], ["get_flight_cost(travel_from='RMS', travel_to='GFD', travel_date='2024-12-15', travel_class='business')"], ["set_budget_limit(access_token='abc123xyz', budget_limit=5000)"], ["book_flight(access_token='abc123xyz', card_id='card1234', travel_date='2024-12-15', travel_from='RMS', travel_to='GFD', travel_class='business', travel_cost=640.0)"], ["retrieve_invoice(access_token='abc123xyz', booking_id='3426812')"]]} | |||
{"id": "multi_turn_base_154", "ground_truth": [["verify_traveler_information(first_name='Michael', last_name='Smith', date_of_birth='1962-02-14', passport_number='P87654321')"], ["get_flight_cost(travel_from='ORD', travel_to='LAX', travel_date='2024-08-10', travel_class='economy')"], ["set_budget_limit(access_token='token_ABC123XYZ', budget_limit=1500)"], ["book_flight(access_token='token_ABC123XYZ', card_id='card1', travel_date='2024-08-10', travel_from='ORD', travel_to='LAX', travel_class='economy', travel_cost=180.0)"], ["view_messages_sent()"]]} | |||
{"id": "multi_turn_base_155", "ground_truth": [["get_flight_cost(travel_from='LAX', travel_to='JFK', travel_date='2024-11-15', travel_class='business')", "book_flight(access_token='abc123xyz', card_id='id15583', travel_date='2024-11-15', travel_from='LAX', travel_to='JFK', travel_class='business', travel_cost=2400.0)"], ["purchase_insurance(access_token='abc123xyz', insurance_type='comprehensive', insurance_cost=120.0, booking_id='3426812', card_id='id15583')"]]} | |||
{"id": "multi_turn_base_156", "ground_truth": [["get_flight_cost(travel_from='SFO', travel_to='LAX', travel_date='2024-10-15', travel_class='first')", "book_flight(access_token='abc123xyz456', card_id='travel_card_12345', travel_date='2024-10-15', travel_from='SFO', travel_to='LAX', travel_class='first', travel_cost=1000.0)"], ["retrieve_invoice(access_token='abc123xyz456', booking_id='3426812')"], ["contact_customer_support(booking_id='3426812', message='Urgent: last-minute complication with my reservation.')"], ["ticket_login(username='mthompson', password='securePass123')", "create_ticket(title='Urgent Flight Issue', priority=4)"], ["edit_ticket(ticket_id=0, updates={'status':'Urgent', 'priority':5})"]]} | |||
{"id": "multi_turn_base_156", "ground_truth": [["get_flight_cost(travel_from='SFO', travel_to='LAX', travel_date='2024-10-15', travel_class='first')", "book_flight(access_token='abc123xyz456', card_id='travel_card_12345', travel_date='2024-10-15', travel_from='SFO', travel_to='LAX', travel_class='first', travel_cost=1000.0)"], ["retrieve_invoice(access_token='abc123xyz456', booking_id='3426812')"], ["contact_customer_support(booking_id='3426812', message='Urgent: last-minute complication with my reservation.')"], ["ticket_login(username='mthompson', password='securePass123')", "create_ticket(title='Urgent Flight Issue', priority=4)"], ["edit_ticket(ticket_id=0, updates={'status':'Urgent'})"]]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user query says Could you enhance the ticket's details by updating its status to 'Urgent' to reflect the highest priority?
, so we need to have the updates
argument as {'status':'Urgent', 'priority':5}
. Otherwise the priority will be kept at 4.
But I could see how someone could interpret this as, change only the status to urgent, since an urgent status might also mean the highest priority. I think updating the query wording is probably the best way to eliminate such ambiguity.
@@ -178,7 +178,7 @@ | |||
{"id": "multi_turn_base_177", "ground_truth": [["book_flight(access_token='abc123xyz', card_id='card_7629', travel_date='2024-12-15', travel_from='LAX', travel_to='HND', travel_class='business', travel_cost=1200.0)"], ["retrieve_invoice(access_token='abc123xyz', booking_id='3426812')"]]} | |||
{"id": "multi_turn_base_178", "ground_truth": [["register_credit_card(access_token='abc123xyz', card_number='4012888888881881', expiration_date='12/2028', card_verification_number=465, cardholder_name='Michael Smith')"], ["purchase_insurance(access_token='abc123xyz', insurance_type='comprehensive', booking_id='flight_001', insurance_cost=200.0, card_id='262919693687')", "retrieve_invoice(access_token='abc123xyz', booking_id='flight_001')"], ["contact_customer_support(booking_id='flight_001', message='Insurance-related issues that need resolution')"], ["ticket_login(username='msmith', password='SecurePass123')", "create_ticket(priority=4, title='Unsatisfactory Customer Support', description='')"], ["cancel_booking(access_token='abc123xyz', booking_id='flight_001')"]]} | |||
{"id": "multi_turn_base_179", "ground_truth": [["book_flight(access_token='abc123xyz', card_id='card_6789', travel_date='2024-11-12', travel_from='CRH', travel_to='JFK', travel_class='economy', travel_cost=850.0)"], ["purchase_insurance(access_token='abc123xyz', insurance_type='comprehensive', booking_id='3426812', insurance_cost=100.0, card_id='card_6789')"], ["retrieve_invoice(access_token='abc123xyz', booking_id='3426812')"], ["contact_customer_support(booking_id='3426812', message='Request for itinerary adjustments or refund due to family emergency.')"]]} | |||
{"id": "multi_turn_base_180", "ground_truth": [["compute_exchange_rate(base_currency='RMB', target_currency='USD', value=20000.0)", "set_budget_limit(access_token='abc123xyz', budget_limit=2857.14)"], ["get_flight_cost(travel_from='LAX', travel_to='JFK', travel_date='2024-10-12', travel_class='business')", "book_flight(access_token='abc123xyz', card_id='main_card', travel_date='2024-10-12', travel_from='LAX', travel_to='JFK', travel_class='business', travel_cost=2400.0)"], ["cancel_booking(access_token='abc123xyz', booking_id='3426812')"], [], [], ["ticket_login(username='mzhang', password='SecurePass123')", "create_ticket(priority=3, title='Invoice Discrepancy', description='Problem encountered with invoicing.')"]]} | |||
{"id": "multi_turn_base_180", "ground_truth": [["set_budget_limit(access_token='abc123xyz', budget_limit=2857.14)"], ["get_flight_cost(travel_from='LAX', travel_to='JFK', travel_date='2024-10-12', travel_class='business')", "book_flight(access_token='abc123xyz', card_id='main_card', travel_date='2024-10-12', travel_from='LAX', travel_to='JFK', travel_class='business', travel_cost=2400.0)"], ["cancel_booking(access_token='abc123xyz', booking_id='3426812')"], [], [], ["ticket_login(username='mzhang', password='SecurePass123')", "create_ticket(priority=3, title='Invoice Discrepancy', description='Problem encountered with invoicing.')"]]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since compute_exchange_rate
is a read operation, it won’t change the state even if we skip it. However, we still require the function call as part of the path to produce the correct final answer. If a function for currency conversion is available, the model should always rely on that rather than using any of its pre-trained exchange rates knowledge, since those may be outdated and currency values can change rapidly.
This PR includes fixes to the ground truth file in the Gorilla repository. While implementing Gorilla Multi-Turn BFCL-v3 to introduce multi-turn interactions in our platform xpander.ai for performance evaluation, I encountered several issues in the ground truth data. However, additional issues were found and will be submitted separately. This PR addresses those issues to improve accuracy and reliability.