Multi-Scale Spatio-Temporal Aggregation Network for Human Motion Prediction

Human motion prediction is a fundamental problem in computer vision, aimed at predicting future motion sequence from historical motion sequence. Some recent works have shown that Graph Convolutional Networks(GCNs) perform well in modeling the correlation between human joints, and Temporal Convolutio...

Full description

Saved in:
Bibliographic Details
Published in:2023 18th International Conference on Intelligent Systems and Knowledge Engineering (ISKE) pp. 410 - 417
Main Authors: Su, Haoyu, Liu, Shenglan, Gao, Zewen, Dong, Yifeng, Yang, Junshi, Ding, Suhao
Format: Conference Proceeding
Language:English
Published: IEEE 17-11-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human motion prediction is a fundamental problem in computer vision, aimed at predicting future motion sequence from historical motion sequence. Some recent works have shown that Graph Convolutional Networks(GCNs) perform well in modeling the correlation between human joints, and Temporal Convolutional Networks(TCNs) have been widely recognized for solving sequence problems. However, the locality of convolution operations makes it difficult to model the distant joints relations and long-term temporal information. To solve this problem, we propose a Multi-Scale Spatio-Temporal Graph Convolution(MST-GC) module and a Multi-Scale Temporal Convolution(M- Tc)module, which decompose the local convolution into a set of sub-convolutions that allow each joint to establish connections with distant nodes in both spatial and temporal dimensions. This enlarges the receptive field of the model, better capturing the spatio-temporal dependencies of human motion sequences. By combining these two modules, we further propose a novel Multi-Scale Spatio-Temporal Aggregation Network (MSTAN). Extensive experiments are conducted to show that the proposed MSTAN outperforms state-of-the-art methods in both shortand long-term motion prediction on the datasets of Human3.6M and AMASS.
DOI:10.1109/ISKE60036.2023.10481288