Speech Emotion Recognition

Document Preview

This document needs to be opened externally to view its contents.

Implemented state-of-the-art DL models to recognise emotions from speech. Used MFCC and spectrogram features from audio as input to the CNN model on top of fine-tuned AlexNet with an attention mechanism to reach an accuracy of 82% on the EMO DB dataset.

Pytorch