Статья

GazeCapsNet: A Lightweight Gaze Estimation Framework

Shakhnoza MuksimovaDepartment of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Republic of KoreaYakhyokhuja ValikhujaevSabina UmirzakovaDepartment of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Republic of KoreaJushkin BaltayevDepartment of Information Systems and Technologies of the Tashkent State University of Economic, Tashkent 100066, UzbekistanYoung Im ChoDepartment of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Republic of Korea

Sensorsjournal2025en

ABI

Аннотация

Gaze estimation is increasingly pivotal in applications spanning virtual reality, augmented reality, and driver monitoring systems, necessitating efficient yet accurate models for mobile deployment. Current methodologies often fall short, particularly in mobile settings, due to their extensive computational requirements or reliance on intricate pre-processing. Addressing these limitations, we present Mobile-GazeCapsNet, an innovative gaze estimation framework that harnesses the strengths of capsule networks and integrates them with lightweight architectures such as MobileNet v2, MobileOne, and ResNet-18. This framework not only eliminates the need for facial landmark detection but also significantly enhances real-time operability on mobile devices. Through the innovative use of Self-Attention Routing, GazeCapsNet dynamically allocates computational resources, thereby improving both accuracy and efficiency. Our results demonstrate that GazeCapsNet achieves competitive performance by optimizing capsule networks for gaze estimation through Self-Attention Routing (SAR), which replaces iterative routing with a lightweight attention-based mechanism, improving computational efficiency. Our results show that GazeCapsNet achieves state-of-the-art (SOTA) performance on several benchmark datasets, including ETH-XGaze and Gaze360, achieving a mean angular error (MAE) reduction of up to 15% compared to existing models. Furthermore, the model maintains a real-time processing capability of 20 milliseconds per frame while requiring only 11.7 million parameters, making it exceptionally suitable for real-time applications in resource-constrained environments. These findings not only underscore the efficacy and practicality of GazeCapsNet but also establish a new standard for mobile gaze estimation technologies.

Темы

Gaze Tracking and Assistive Technology EEG and Brain-Computer Interfaces Advanced Computing and Algorithms

Идентификаторы

DOI: 10.3390/s25041224

Цитирования и источники

Цитирований: 0Использованных источников: 43

Показатели — AkademScholar · Скоро