docs: added a data analysis between model size over tool calling performance #931
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an analysis of how model size correlates with various performance metrics in the Berkeley Tool-Calling Function Leaderboard.
Key Updates
Preprocessed leaderboard data (cleaning, filtering, and structuring).
Filtering out proprietary models (as most of those doesn't publish their model size.)
manual edit model size column and summarise as a csv file.
Correlation analysis between model size and accuracy features.
Visualisation with a heatmap to highlight trends.
Motivation
As part of my personal research, I aim to understand how model size impacts performance across different metrics and identify patterns in scaling. I feel this analysis is also useful for general users and thats why i raise a PR here.