Асосий контентга ўтиш
AkademIndex

Маҳсулотлар

Ишлаб чиқувчилар учун

AkademBaseЭкотизим учун очиқ API
Мақола

Deep Learning-Powered Gesture and Speech Recognition in Augmented Reality Interfaces

Odilbek KosimovTermez University of Economics and Service,Department of Information Technology and Exact Sciences,Termez,UzbekistanYesMatyakubov NurbekUrgench Innovation University,Social-Humaniratidan Department,Urgench City,UzbekistanOtajanov OlimboyUrgench State University,Department of Pedagogy and Psychology,Urgench,UzbekistanAditya Kumar SharmaTula’s Institute,Department of Computer Science and Engineering,Dehradun,India,248197Feruza JumaniyazovaMamun University,Department Romano-Germanic Philology,UzbekistanFarrukh NurullayevBukhara State Pedagogical Institute,Department of Music Education,Bukhara,Uzbekistan
2025
ABI

Аннотация

Augmented Reality (AR) interfaces demand intuitive and natural human-computer interaction modalities to enhance user experience and accessibility. Traditional AR input methods often rely on handheld controllers or touch-based interactions that can limit the immersive potential of AR applications. This research aims to develop and evaluate a multimodal deep learning framework that integrates gesture and speech recognition for seamless AR interface control. The methodology employs a hybrid architecture combining Convolutional Neural Networks (CNNs) for real-time hand gesture recognition and Transformer-based models for continuous speech recognition, integrated through a fusion layer that processes multimodal inputs simultaneously. The system was trained on a custom dataset of 50,000 gesture samples and 100,000 speech utterances collected from 200 participants across diverse demographic groups. The experimental findings indicate that the suggested multimodal system attains 94.7 % in gesture recognition and 96.2 % in speech recognition and together they constitute 97.8 % accuracy where both modalities are exploited jointly. The framework is interactive with an average of 45ms latency and able to support real-time performance of AR applications. This study leads to the further development of the natural user interface in the AR context and has large-scale prospects and implications in the sphere of accessibility and productivity apps and immersive computing experience.

Ҳали таржима қилинмаган

Мавзулар

Идентификаторлар

Иқтибослар ва манбалар