A WavLM-Based Greedy Forest Framework for Robust In-Vehicle Audio Event Detection and Driver Safety Enhancement
DOI:
https://doi.org/10.62643/ijerst.2026.v22.n2.pp40-49Keywords:
In-Vehicle Audio Event Detection, WavLM, GFC, Acoustic Signal Classification, Driver Alertness Systems, Intelligent Vehicle SafetyAbstract
Road safety and driver alertness systems increasingly require reliable perception beyond vision, as critical vehicle events are often conveyed through sound. Studies indicate that a significant portion of driving-relevant cues such as engine startup anomalies, braking irregularities, and idle-state faults are acoustic in nature and may be missed under high cabin noise or driver distraction. With the rapid growth of intelligent vehicles, there is a strong need for robust in-vehicle audio event detection systems that can accurately classify multiple vehicle states in real time. Despite this need, existing approaches largely depend on manual monitoring or simple rule-based and traditional classifiers, which suffer from limited accuracy and poor generalization. Manual listening and annotation are time-consuming, subjective, and unsuitable for continuous real-time deployment. To address these limitations, this work proposes an InVehicle Audio Event Detection framework that integrates WavLM based feature extraction with a Proposed Greedy Forest Classifier (GFC). The system begins with an organized audio dataset representing four vehicle states: breaking state, combined, idle state, and startup state. After dataset preprocessing, WavLM is employed to extract high-level, noise-robust acoustic embeddings that capture both temporal and contextual sound characteristics. These features are structured according to dataset folder names, ensuring clear class-wise representation. The extracted embeddings are then split into training and testing sets to enable fair evaluation. Performance is compared against existing classifiers Restricted Boltzmann Machine (RBM), Learning Vector Quantization (LVQ) and Linear Discriminant Analysis (LDA) to highlight their limitations in handling high-dimensional audio representations. The proposed GFC classifier leverages ensemble-based greedy learning to improve discriminative power, robustness, and generalization across varying in-vehicle acoustic environments. Experimental results demonstrate that the proposed approach achieves superior classification accuracy and reliability, making it suitable for real-time driver alertness and vehicle safety applications.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.













