Human Pose Estimation and Skeleton-Based Action Recognition: A Systematic Review of 2D/3D Deep Learning Approaches
Annotatsiya
This article presents a systematic review of modern 2D and 3D deep learning approaches used in the fields of Human Pose Estimation (HPE) and Skeleton-Based Action Recognition (SBAR). The research was conducted based on the SALSA methodology and the published works in leading scientific databases were analyzed. Within the framework of the review, approaches based on the detection of human joint points from image and video sequences, reconstruction of the skeletal structure, and modeling of actions in the spatial and spatiotemporal (ST) domains were compared. The article reviews the architectures of 2D and 3D HPE models, joint hiding in multi-person scenes, real-time requirements, and the possibilities of application in embedded devices. The spatial, spatiotemporal, and graphical features used in SBAR systems and their impact on computational complexity and energy efficiency are also analyzed. The performance of the models was compared based on evaluation criteria such as MPJPE (Mean Per Joint Position Error), AP (Average Precision), RMSE (Root Mean Square Error), and Pearson correlation. The results show that skeleton-based approaches are effective solutions for real-time and resource-constrained systems. However, choosing the optimal model and features requires a trade-off between accuracy, computational complexity, and energy efficiency. This paper identifies promising architectures, features, and hardware adaptation strategies for practical applications of HPE and SBAR systems in real-world environments.