Maqola

Layer-Wise Probing of Paralinguistic Attributes in Fine-Tuned Whisper for Kazakh Speech

Aimoldir AldabergenSchool of Sciences and Humanities, Nazarbayev University, Almaty, KazakhstanBakdaulet KynabayFaculty of Engineering and Natural Sciences, SDU University, Almaty, KazakhstanShirali KadyrovDepartment of General Education, New Uzbekistan University, Tashkent, Uzbekistan

Engineering Technology & Applied Science Researchjournal2026

ABI

Annotatsiya

Large pre-trained speech models similar to Whisper are now commonly used for speech recognition and related tasks. The distribution of paralinguistic features, which include emotions and speaker characteristics across model layers, remains uncertain, particularly for low-resource languages. The current study evaluates each layer of the Kazakh-adapted Whisper encoder to determine its performance in recognizing emotional expression, speaker identity, age, and gender attributes. We extract fixed representations from every encoder layer and test them with both linear and Multilayer Perceptron (MLP) probes. The evaluation process uses accuracy, macro-averaged F1-score (Macro-F1), and balanced accuracy metrics, whereas non-parametric statistical tests evaluate the importance of changes across different layers. The experimental evaluation of KazEmoTTS focuses on emotional expression, whereas Common Voice (Kazakh) data serve for speaker identification and demographic attribute analysis. The results demonstrate that age and gender information are strongly present at all layers of the model with little change in representation across depths, yet speaker identity shows statistically significant but weak variations between layers. Emotion information appears mainly in the model's middle layers, which is the area where probing is most effective. The research findings reveal how Whisper processes Kazakh speech, allowing researchers to choose appropriate layers for paralinguistic speech applications.

Mavzular

Speech Recognition and Synthesis Face recognition and analysis Authorship Attribution and Profiling

Identifikatorlar

DOI: 10.48084/etasr.17076

Iqtiboslar va manbalar

0 ta iqtibos7 ta foydalanilgan manba

Koʻrsatkichlar — AkademScholar · Tez orada