Accent Classification in Industrial Voice-Controlled IoT Systems Using i-Vector Framework
Аннотация
Accent classification in industrial voice-controlled IoT systems is essential for ensuring accurate speech recognition and safe operation in multilingual industrial environments. With the rise of voice-activated machinery, recognizing diverse accents of operators has become critical for improving command interpretation and operational efficiency. Existing accent classification methods often rely solely on conventional i-vector frameworks or basic acoustic features, which struggle to maintain accuracy in noisy industrial settings and under varying phonetic patterns. To address these limitations, this study proposes an Accent-Adaptive Deep i-Vector Fusion with Convolutional Bottleneck Features (CABi-Vector) framework. The method first extracts MFCC or logmel spectrograms from speech signals and passes them through a convolutional neural network to obtain deep bottleneck features capturing fine-grained phonetic-acoustic patterns. These embeddings are fused with traditional i-vectors to form accent-adaptive representations, which are then classified using a lightweight neural network optimized for real-time industrial deployment. The proposed CAB-iVector framework enhances the robustness of accent recognition in challenging industrial environments and allows voice-controlled IoT systems to dynamically adapt to operator accents. Experimental results demonstrate improved classification accuracy, reduced misinterpretation of commands, and increased operational safety and efficiency compared to conventional methods.