Skip to content

GSOC AI_API_EVAL Poc submission #717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
4df20d2
Merge pull request #1 from GANGSTER0910/idea_submission-harsh
GANGSTER0910 Mar 20, 2025
44224ce
Merge branch 'foss42:main' into main
GANGSTER0910 Mar 26, 2025
ae2c7dd
Create poc_harsh_panchal_AI_API_EVAL.md
GANGSTER0910 Mar 26, 2025
5a365b4
images_poc_harsh_panchal_AI_API_EVAL
GANGSTER0910 Mar 26, 2025
1110f4c
Rename Screenshot 2025-03-26 211623.png to dashboard_image.png
GANGSTER0910 Mar 26, 2025
77bc522
poc_harsh_panchal_AI_API_EVAL
GANGSTER0910 Mar 26, 2025
1104340
poc_harsh_panchal_AI_API_EVAL
GANGSTER0910 Mar 26, 2025
9c885a6
poc_harsh_panchal_AI_API_EVAL
GANGSTER0910 Mar 26, 2025
9a30a61
poc_harsh_panchal_AI_API_EVAL
GANGSTER0910 Mar 26, 2025
648d7c7
Delete doc/proposals/2025/gsoc/images/dashboard2.png
GANGSTER0910 Mar 26, 2025
4c5bb50
added results images poc_harsh_panchal_AI_API_EVAL.md
GANGSTER0910 Mar 26, 2025
d0f87e6
added code and integrate to main apidash
GANGSTER0910 Mar 27, 2025
a5c1412
poc_harsh_panchal_images
GANGSTER0910 Mar 27, 2025
714d199
Delete doc/proposals/2025/gsoc/images/dashboard3.png
GANGSTER0910 Mar 27, 2025
8fc7298
Delete doc/proposals/2025/gsoc/images/dashboard_image.png
GANGSTER0910 Mar 27, 2025
9fca965
Update poc_harsh_panchal_AI_API_EVAL.md
GANGSTER0910 Mar 27, 2025
88c0bdc
update code
GANGSTER0910 Mar 27, 2025
00cfe42
Merge branch 'poc-submission' of https://github.com/GANGSTER0910/apid…
GANGSTER0910 Mar 27, 2025
8fc4581
update model_eval
GANGSTER0910 Mar 27, 2025
39a3459
update poc_harsh_panchal_AI_API_EVAL
GANGSTER0910 Mar 27, 2025
ec45bff
Update evalution.dart for poc_harsh_panchal_AI_API_EVAL
GANGSTER0910 Mar 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added doc/proposals/2025/gsoc/images/dashboard 2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/proposals/2025/gsoc/images/dashboard1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/proposals/2025/gsoc/images/results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 63 additions & 0 deletions doc/proposals/2025/gsoc/poc_harsh_panchal_AI_API_EVAL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# AI API Evaluation Framework - Proof of Concept
This is Proof of Concept (PoC) for AI API evaluation on a structured framework. It benchmarks AI language models against various performance metrics and provides actionable insights by means of easy-to-interpret visualizations. The PoC involves end-to-end integration from API calls to result analysis.

## Objectives

Benchmark the AI models (Falcon 7B, LLaMA 3.2) against others such as BLEU-4, ROUGE-L, BERTScore, METEOR.

Use radar charts to provide a visual comparison.

Facilitate effective monitoring of model performance by real-time latency and cost measurement.

---
## Key Features Implemented
Backend (FastAPI)
Model Evaluation Endpoint: Tests AI models against given data sets.

Scores such as BLEU-4, ROUGE-L, BERTScore, and METEOR are computed from Hugging Face models.

Real-Time Performance Metrics: Tracks latency, cost, and processing time.

Frontend (Flutter)
Interactive Dashboard: Displays model scores in radar charts.

Real-Time Data Display: Presents results of evaluation in a formatted way.

Model Selection: Enables users to select from amongst available AI models.

## Screenshots and Visuals
![alt text](https://github.com/GANGSTER0910/apidash/blob/8fc7298824670397b07d0b42307bb1dd533af1fe/doc/proposals/2025/gsoc/images/dashboard1.png)

![alt text](https://github.com/GANGSTER0910/apidash/blob/8fc7298824670397b07d0b42307bb1dd533af1fe/doc/proposals/2025/gsoc/images/dashboard%202.png)

![alt text](https://github.com/GANGSTER0910/apidash/blob/8fc7298824670397b07d0b42307bb1dd533af1fe/doc/proposals/2025/gsoc/images/results.png)

---
## Proof of Concept Details
Models Evaluated: LLaMA 3.2 (3B) and Falcon 7B.

Dataset: CNN Dailymail (3.0.0) used for benchmarking assessment.

Evaluation Metrics

BLEU-4: Scores n-gram overlap.

ROUGE-L: Checks the longest consecutive matches.

BERTScore: Scores contextual embeddings.

METEOR: Considers synonyms, stemming, and grammar.

To run the code
### Step 1: Clone the Repository
```
# Clone AI API Evaluation Repository

git clone https://github.com/GANGSTER0910/AI_API_EVAL.git
cd AI_API_EVAL

# Install required Python packages
pip install -r requirements.txt
Provide your Hugging Face API token in the FastAPI.py file
run python FastAPI.py
```
Loading