embedded-audio-emotion

Audio-based emotion recognition has many applications in human-computer interaction, mental health assessment, and customer service analytics. This project presents a machine learning-based on-device emotion (i.e., anger, disgust, fear, hap- piness, neutrality, sadness, and surprise) recognition from audio for low-cost embedded devices. We show the influence of the speaker’s mental state on various acoustic features, such as intensity, shimmer, etc. However, classifying the emotions from audio is challenging, as these emotions sound ambiguous for different speakers. Our extensive evaluation with lightweight machine learning models indicates an overall F1-score of 61.2% with below 50 ms response time and 256 KB memory usage in modern embedded devices. The system overview is shown in the above figure.

Installation

To set up the project on your local system, execute the following commands:

git clone https://github.com/prasenjit52282/embedded-audio-emotion.git
cd embedded-audio-emotion
pip install -r requirements.txt

Dataset

Download the dataset required for training and testing from Kaggle. Ensure that the data is structured as follows:

embedded-audio-emotion
└──speech-emotion-recognition-en
   └── Crema
   └── Ravdess
   └── Savee
   └── Tess

Generating the refined dataset for training models

After downloading the dataset and organizing it in the specified structure, generate the necessary CSV files by running the following command:

python dataset_generator.py

This script will produce two files inside generated_dataset directory:

audio_dataframe.csv
extracted_acoustic_features.csv

These files will serve as inputs for training the models.

Training the models

To train the emotion recognition models, use the generated dataset and run the following command:

python train.py generated_dataset/extracted_acoustic_features.csv

The trained models will be saved in the models directory.

Using the Models to Test Audio

Once the models are trained, you can test the emotion recognition on various audio files. Use the following command:

python predictor.py <audio_file_path> <model_path>

Replace <audio_file_path> with the path to the audio file you want to test and <model_path> with the path to the trained model.

A demo example is as follows

python predictor.py ./demo_audio_files/DC_h05.wav ./models/RandomForestClassifier

Directory Structure

The project directory is organized as follows:

embedded-audio-emotion
└── emlearn
└── demo_audio_files
    └── DC_h05.wav
└── logs
└── webdemo
└── dataset_generator.py
└── em_model.py
└── extractor.py
└── predictor.py
└── README.md
└── requirements.txt
└── train.py

Reference

To refer the Codebase, please cite the following work.

BibTex Reference:

This work has been accepted as a poster paper in IEEE PerCom 2025 (Work-in-Progress Track).
The citation is comming soon.

For questions and general feedback, contact Prasenjit Karmakar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

embedded-audio-emotion

Installation

Dataset

Generating the refined dataset for training models

Training the models

Using the Models to Test Audio

Directory Structure

Reference

About

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
demo_audio_files		demo_audio_files
emlearn @ f576ceb		emlearn @ f576ceb
generated_dataset		generated_dataset
logs		logs
models		models
webdemo @ 0ed2c38		webdemo @ 0ed2c38
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
classifier.h		classifier.h
dataset_generator.py		dataset_generator.py
em_model.py		em_model.py
extracted_acoustic_features.csv		extracted_acoustic_features.csv
extractor.py		extractor.py
plots.py		plots.py
predictor.py		predictor.py
requirements.txt		requirements.txt
train.py		train.py

License

prasenjit52282/embedded-audio-emotion

Folders and files

Latest commit

History

Repository files navigation

embedded-audio-emotion

Installation

Dataset

Generating the refined dataset for training models

Training the models

Using the Models to Test Audio

Directory Structure

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages