Emotion Detection from Noisy Voice Recordings Using Machine Learning

Dr Vijaya Lakshmi, Karthikeya Reddy Velagala, Karangula yeshwanth reddy

doi:10.62643/

Authors

Dr Vijaya Lakshmi, Karthikeya Reddy Velagala, Karangula yeshwanth reddy Author

DOI:

https://doi.org/10.62643/

Abstract

Speech Emotion Detection (SED) has emerged as a pivotal technology for enhancing human–computer interaction by enabling machines to recognize emotions from speech signals. This research presents a robust SED system leveraging Convolutional Neural Networks (CNN) and CNNLSTM architectures trained on the RAVDESS dataset, which contains 1,440 high-quality audio samples across eight emotion classes. The system incorporates comprehensive audio preprocessing including noise reduction, amplitude normalization, silence trimming, and uniform resampling to ensure data consistency. Feature extraction is performed using Mel-Frequency Cepstral Coefficients (MFCC), MelSpectrograms, and Chroma features, transforming raw audio into image-like representations suitable for deep learning models. The CNN captures spatial patterns in the spectrograms, while the CNN-LSTM additionally models temporal dependencies, improving recognition of subtle and sequential emotional cues. The model is optimized with Adam optimizer, categorical cross-entropy loss, and regularization strategies, and evaluated using accuracy, precision, recall, F1-score, and confusion matrices. Deployment is realized through a Flaskbased web interface enabling real-time emotion recognition from user-uploaded or recorded audio. Experimental results demonstrate that the proposed system achieves high accuracy, robustness to noise, and generalization to unseen speech, making it suitable for applications in mental health monitoring, call centers, virtual assistants, and intelligent human–computer interaction.