KMS Chongqing Institute of Green and Intelligent Technology, CAS
Hierarchical Spatial-Temporal Adaptive Graph Fusion for Monocular 3D Human Pose Estimation | |
Zhang, Lijun1,2; Lu, Feng3,4; Zhou, Kangkang1,2; Zhou, Xiang-Dong1,2; Shi, Yu1,2 | |
2024 | |
摘要 | Single-view 3D human pose estimation (HPE) based on Graph Convolutional Networks currently suffers from problems such as insufficient feature representation and depth ambiguity. To address these issues, this letter proposes a hierarchical spatial-temporal adaptive graph fusion framework to improve monocular 3D HPE performance. Firstly, to enhance the spatial semantic feature representation of human nodes, a progressive adaptive graph feature capture strategy is developed, which adaptively constructs global-to-local attention graph features of all human joints in a coarse-to-fine manner. A spatial-temporal attention fusion module is then constructed to model long-term sequential dependencies and mitigate depth ambiguity. The temporal attention factors of related frames are captured and utilized to intermediately supervise the joint depth. The spatial semantic information among all joints in the same frame and temporal contextual knowledge of the joints across relevant frames are fused to build spatial-temporal correlations and optimize the final features. Extensive experiments on two popular benchmarks show that our method outperforms several state-of-the-art approaches and improves 3D HPE performance. |
关键词 | 3D human pose estimation attention mechanism graph convolutional network spatial-temporal fusion |
DOI | 10.1109/LSP.2023.3339060 |
发表期刊 | IEEE SIGNAL PROCESSING LETTERS |
ISSN | 1070-9908 |
卷号 | 31页码:61-65 |
通讯作者 | Zhou, Xiang-Dong(zhouxiangdong@cigit.ac.cn) |
收录类别 | SCI |
WOS记录号 | WOS:001138710200016 |
语种 | 英语 |