Intrinsic Topology and Multi-Scale Temporal Modeling for Skeleton-Based Human Action Recognition in Smart Surveillance Systems
Аннотация
Skeleton-based activity recognition has emerged as an essential element in intelligent surveillance systems owing to its resilience to variations in illumination, backdrop, and appearance. Graph Convolutional Networks (GCNs) have demonstrated considerable potential in modeling human motion patterns derived from skeletal data. Nevertheless, current GCN-based methodologies frequently neglect inherent topological linkages, possess restricted temporal modeling capabilities, and do not adequately represent the functional interrelation between joints and bones. We offer an innovative approach to human action recognition that integrates intrinsic bone structure with multi-scale temporal dynamics, specifically designed for real-time surveillance applications. The model incorporates an internal topological space graph convolution module that utilizes a multi-head self-attention mechanism and a common topological structure to deduce latent contextual relationships among joints. A multi-scale temporal convolution module is concurrently developed to record both fine- and coarse-grained motion patterns across different action durations. To improve feature interaction and accurately represent the structural intricacies of human movement, the model integrates a joint–bone interaction bridge, facilitating efficient fusion and transmission of skeletal data. Assessed on the NTU-RGB[Formula: see text]D60 and 120 datasets, the suggested technique attains state-of-the-art accuracy: 91.5% (CS) and 96.9% (CV) for NTU-RGB[Formula: see text]D60, and 89.0% (C-Sub) and 90.8% (C-Set) for NTU-RGB[Formula: see text]D120. The results illustrate the efficacy of the suggested method in extracting comprehensive spatiotemporal features from skeletal data, providing a dependable and scalable solution for intelligent multisensor monitoring systems.