Audio Based Emotion Recognition and Analysis

In the rapidly evolving landscape of machine learning and audio processing, the capability to recognize emotions from audio data has transformative potential. This study embarked on an expedition to understand and optimize this capability, using the R language as our primary analytical tool. Using R, we transformed raw audio data into insightful features that resonated with emotion-related cues. Three pivotal machine learning models - SVM, KNN, and XGBoost - were employed, with a particular emphasis on the art and science of feature selection. Preliminary findings underline the distinct preferences of each model; KNN favored lower-dimensional data, SVM thrived with a richer feature set, and XGBoost, while promising, posed challenges that warrant further exploration. Crucially, the study illuminated the importance of specific features, notably BFCC, spectral contrast, and RMSE, in deciphering emotions from audio. However, it’s salient to mention that certain features like torretz and chroma, known for their pertinence in emotion recognition, remained outside our grasp due to R’s limitations. This revelation not only emphasizes the challenges faced but also underscores the potential enhancements the R ecosystem could introduce. In essence, this expedition not only charted the territories of emotion recognition in R but also illuminated potential pathways for future explorations and enhancements.

PDF Preview

Download PDF