-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AI API Eval Framework #618
Comments
@f-ei8ht Currently, lm-evaluation-harness is the most popular LLM eval framework which supports the evaluation of models served via several commercial APIs or local inference APIs. But, it is not user friendly and requires coding background to use. This project is trying to solve the issue of providing an easy way to evaluate the AI API responses for any task benchmark. Read LLM Benchmarks Explained: Everything on MMLU, HellaSwag, BBH, and Beyond Let us take for example of MMLU benchmark and test a model served using Ollama local API. In this feature, the user should be able to select the benchmark (MMLU) for which the API is being evaluated. API Dash will read the benchmark datasets (download if not available), process it and create the API requests which will be executed. The API response received will be processed and used to calculate the benchmark score. Everything happens in a user-friendly manner, where the user is able to see the progress of evaluations, pause/resume evaluation, visualize the end result easily. |
Hello @ashitaprasad I have been exploring ways to integrate the AI API evaluation framework into API Dash and would love to discuss the best approach for implementation. Since API Dash is currently a Dart/Flutter-only project with no existing Python backend, I see two potential approaches:
Implement AI API evaluation using Dart and Flutter libraries. Keeps everything self-contained within API Dash. May have limitations in handling complex AI model evaluations. 2.Hybrid Approach (Python Backend + API Dash): Develop a separate Python backend (FastAPI) for AI evaluations. API Dash communicates with it via HTTP requests. Allows more advanced AI benchmarks using Python’s ML ecosystem. I would like to hear your thoughts on which approach aligns best with API Dash’s architecture and goals. Are there any preferred methodologies, constraints, or additional considerations I should be aware of? |
@GANGSTER0910 The requirement is clearly explained in the thread above. |
Hey @ashitaprasad I would like to work on this idea |
Sure @ShivamVashisth28 |
[Related Issue: #618] Add GSoC 2025 Idea Proposal for AI API Eval
Hi @ashitaprasad, I’m interested in working on the AI API Evaluation Framework for GSoC 2025. I have experience in AI and API evaluation and would like to contribute to this project. Are there any specific areas that still need contributors? I’d also like to discuss how I can differentiate my proposal from existing ones. Looking forward to your guidance! |
Tell us about the task you want to perform and are unable to do so because the feature is not available
Develop an end-to-end AI API eval framework and integrate it in API Dash. This framework should (list is suggestive, not exhaustive):
The text was updated successfully, but these errors were encountered: