AI API Eval Framework #618

ashitaprasad · 2025-02-23T16:12:38Z

Tell us about the task you want to perform and are unable to do so because the feature is not available

Develop an end-to-end AI API eval framework and integrate it in API Dash. This framework should (list is suggestive, not exhaustive):

Provide an intuitive interface for configuring API requests, where users can input test datasets, configure request parameters, and send queries to various AI API services
Support evaluation AI APIs (text, multimedia, etc) across various industry task benchmarks
Allow users to add custom dataset/benchmark & criteria for evaluation. This custom scoring mechanisms allow tailored evaluations based on specific project needs
Visualize the results of API eval via tables, charts, and graphs, making it easy to identify trends, outliers, and performance variations
Allow execution of batch evaluations
Work with both offline & online models and datasets

ashitaprasad · 2025-03-01T18:13:36Z

Hi! @ashitaprasad I find this issue very interesting and would love to work on it as part of GSoC. The idea of building an AI API evaluation framework and integrating it into API Dash aligns well with my skills in AI, Python, and Flutter. I have experience in developing evaluation frameworks and data visualization tools.

I’d like to discuss the project in more detail. Are there any specific AI APIs or benchmarks that should be prioritized? Also, should the evaluation framework support parallel execution for batch processing?

@f-ei8ht Currently, lm-evaluation-harness is the most popular LLM eval framework which supports the evaluation of models served via several commercial APIs or local inference APIs. But, it is not user friendly and requires coding background to use.

This project is trying to solve the issue of providing an easy way to evaluate the AI API responses for any task benchmark.

Read LLM Benchmarks Explained: Everything on MMLU, HellaSwag, BBH, and Beyond

Let us take for example of MMLU benchmark and test a model served using Ollama local API.

In this feature, the user should be able to select the benchmark (MMLU) for which the API is being evaluated. API Dash will read the benchmark datasets (download if not available), process it and create the API requests which will be executed. The API response received will be processed and used to calculate the benchmark score.

Everything happens in a user-friendly manner, where the user is able to see the progress of evaluations, pause/resume evaluation, visualize the end result easily.

GANGSTER0910 · 2025-03-11T16:59:59Z

Hello @ashitaprasad I have been exploring ways to integrate the AI API evaluation framework into API Dash and would love to discuss the best approach for implementation. Since API Dash is currently a Dart/Flutter-only project with no existing Python backend, I see two potential approaches:

Dart-Only Implementation:

Implement AI API evaluation using Dart and Flutter libraries.

Keeps everything self-contained within API Dash.

May have limitations in handling complex AI model evaluations.

2.Hybrid Approach (Python Backend + API Dash):

Develop a separate Python backend (FastAPI) for AI evaluations.

API Dash communicates with it via HTTP requests.

Allows more advanced AI benchmarks using Python’s ML ecosystem.

I would like to hear your thoughts on which approach aligns best with API Dash’s architecture and goals. Are there any preferred methodologies, constraints, or additional considerations I should be aware of?

ashitaprasad · 2025-03-12T15:59:18Z

@GANGSTER0910 The requirement is clearly explained in the thread above.

ShivamVashisth28 · 2025-03-13T19:30:02Z

Hey @ashitaprasad I would like to work on this idea

ashitaprasad · 2025-03-14T01:22:54Z

Sure @ShivamVashisth28

[Related Issue: #618] Add GSoC 2025 Idea Proposal for AI API Eval

aakarshgopishetty · 2025-03-23T09:49:04Z

Hi @ashitaprasad, I’m interested in working on the AI API Evaluation Framework for GSoC 2025. I have experience in AI and API evaluation and would like to contribute to this project. Are there any specific areas that still need contributors? I’d also like to discuss how I can differentiate my proposal from existing ones. Looking forward to your guidance!

ashitaprasad added the enhancement New feature or request label Feb 23, 2025

ashitaprasad added good first issue Good for newcomers and removed enhancement New feature or request labels Mar 1, 2025

nb923 mentioned this issue Mar 17, 2025

[Related Issue: #618] Add GSoC 2025 Idea Proposal for AI API Eval #673

Merged

This was referenced Mar 20, 2025

initial idea_harsh_panchal_AI_API_EVAL.md GANGSTER0910/apidash#1

Merged

initial idea_harsh_panchal_AI_API_EVAL.md #686

Merged

ashitaprasad added a commit that referenced this issue Mar 21, 2025

Merge pull request #673 from nb923/main

a7237bf

[Related Issue: #618] Add GSoC 2025 Idea Proposal for AI API Eval

This was referenced Mar 25, 2025

GSoC - AI API Eval nb923 Proposal + Prototype #708

Closed

GSoC - AI API Eval nb923 Proposal #712

Merged

GSoC - AI API Eval nb923 Prototype #715

Closed

This was referenced Mar 26, 2025

GSOC AI_API_EVAL Poc submission #717

Closed

Gsoc_AI_API_EVAL_draft_proposal #733

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI API Eval Framework #618

AI API Eval Framework #618

ashitaprasad commented Feb 23, 2025

ashitaprasad commented Mar 1, 2025 •

edited

Loading

GANGSTER0910 commented Mar 11, 2025

ashitaprasad commented Mar 12, 2025

ShivamVashisth28 commented Mar 13, 2025

ashitaprasad commented Mar 14, 2025

aakarshgopishetty commented Mar 23, 2025

AI API Eval Framework #618

AI API Eval Framework #618

Comments

ashitaprasad commented Feb 23, 2025

Tell us about the task you want to perform and are unable to do so because the feature is not available

ashitaprasad commented Mar 1, 2025 • edited Loading

GANGSTER0910 commented Mar 11, 2025

ashitaprasad commented Mar 12, 2025

ShivamVashisth28 commented Mar 13, 2025

ashitaprasad commented Mar 14, 2025

aakarshgopishetty commented Mar 23, 2025

ashitaprasad commented Mar 1, 2025 •

edited

Loading