Статья

Intrinsic Topology and Multi-Scale Temporal Modeling for Skeleton-Based Human Action Recognition in Smart Surveillance Systems

Shakir KhanUCRD, Chandigarh University, Mohali 140413, IndiaDivyanshu SinhaAmrita School of Artificial Intelligence, Amrita Hospital, Mata Amritanandamayi Marg, Sector 88, Faridabad, Haryana 121002, IndiaFatimah AlhayanDepartment of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi ArabiaMukesh SoniCentre for Research Impact & Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura 140401, Punjab, IndiaNavruzbek ShavkatovDepartment of Corporate Finance and Securities, Tashkent State University of Economics, Tashkent, UzbekistanMohd FazilCollege of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi ArabiaToufik MziliELITES Laboratory, Department of Computer Science, Faculty of Sciences, Chouaib Doukkali University, El Jadida 24300, MoroccoFaiz UllahLAROSERIE Laboratory, Department of Computer Science, Faculty of Sciences, Chouaib Doukkali University, El Jadida 24300, MoroccoFaheem Ahmad ReeguCollege of Engineering and Computer Science, Department of Electrical and Electronics Engineering, Jazan University, Saudi Arabia

International Journal of Humanoid Roboticsjournal2026en

ABI

Аннотация

Skeleton-based activity recognition has emerged as an essential element in intelligent surveillance systems owing to its resilience to variations in illumination, backdrop, and appearance. Graph Convolutional Networks (GCNs) have demonstrated considerable potential in modeling human motion patterns derived from skeletal data. Nevertheless, current GCN-based methodologies frequently neglect inherent topological linkages, possess restricted temporal modeling capabilities, and do not adequately represent the functional interrelation between joints and bones. We offer an innovative approach to human action recognition that integrates intrinsic bone structure with multi-scale temporal dynamics, specifically designed for real-time surveillance applications. The model incorporates an internal topological space graph convolution module that utilizes a multi-head self-attention mechanism and a common topological structure to deduce latent contextual relationships among joints. A multi-scale temporal convolution module is concurrently developed to record both fine- and coarse-grained motion patterns across different action durations. To improve feature interaction and accurately represent the structural intricacies of human movement, the model integrates a joint–bone interaction bridge, facilitating efficient fusion and transmission of skeletal data. Assessed on the NTU-RGB[Formula: see text]D60 and 120 datasets, the suggested technique attains state-of-the-art accuracy: 91.5% (CS) and 96.9% (CV) for NTU-RGB[Formula: see text]D60, and 89.0% (C-Sub) and 90.8% (C-Set) for NTU-RGB[Formula: see text]D120. The results illustrate the efficacy of the suggested method in extracting comprehensive spatiotemporal features from skeletal data, providing a dependable and scalable solution for intelligent multisensor monitoring systems.

Темы

Human Pose and Action Recognition Gait Recognition and Analysis Context-Aware Activity Recognition Systems

Идентификаторы

DOI: 10.1142/s0219843626400128

Цитирования и источники

Цитирований: 0Использованных источников: 36

Показатели — AkademScholar · Скоро