Pedestrian trajectory prediction based on dual-attention spatial-temporal graph convolutional network
Xiaoqian XIANG1(),Jing CHEN1,2,*()
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China 2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computing Intelligence, Jiangnan University, Wuxi 214122, China
There are two major challenges in current research on pedestrian trajectory prediction: 1) how to effectively extract the spatial-temporal correlation between the front and back frames of pedestrians; 2) how to avoid performance degradation due to the influence of sampling bias in the trajectory sampling process. In response to the above two problems, a pedestrian trajectory prediction model was proposed based on the dual-attention spatial-temporal graph convolutional network and the purposive sampling network. Temporal attention was utilized to capture the correlation between the front and back frames, and spatial attention was utilized to capture the correlation between the surrounding pedestrians. Subsequently, the spatial-temporal correlations between pedestrians were further extracted by spatial-temporal graph convolution. Meanwhile, a learnable sampling network was introduced to resolve the problem of uneven distribution caused by random sampling. Extensive experiments showed that the accuracy of this method was comparable to that of the current state-of-the-art methods on the ETH and UCY datasets, but the number of model parameters and the inference time were reduced by 1.65×104 and 0.147 s, respectively; while the accuracy on the SDD dataset slightly decreased, but the amount of model parameters was reduced by 3.46×104, which showing a good performance balance. The proposed model can provide a new effective way for pedestrian trajectory prediction.
Fig.2Convolutional structure chart of spatial-temporal graph
Fig.3Structure chart of graph attention network
模型
年份
ADE/FDE
ETH
HOTEL
UNIV
ZARA1
ZARA2
平均值
PITF[16]
2019
0.73/1.65
0.30/0.59
0.60/1.27
0.38/0.81
0.31/0.68
0.46/1.00
STGAT[17]
2019
0.50/0.84
0.26/0.46
0.51/1.07
0.33/0.64
0.30/0.61
0.38/0.72
BIGAT[14]
2019
0.69/1.29
0.49/1.01
0.55/1.32
0.30/0.62
0.36/0.75
0.48/1.00
Social-STGCNN[7]
2020
0.64/1.11
0.49/0.85
0.44/0.79
0.34/0.53
0.30/0.48
0.44/0.75
PECNET[15]
2020
0.54/0.87
0.18/0.24
0.35/0.60
0.22/0.39
0.17/0.30
0.29/0.48
STAR[18]
2020
0.36/0.65
0.17/0.36
0.31/0.62
0.26/0.55
0.22/0.46
0.26/0.53
SGCN[8]
2021
0.63/1.03
0.32/0.55
0.37/0.70
0.29/0.53
0.25/0.45
0.37/0.65
AGENTFORMER[19]
2021
0.45/0.75
0.14/0.22
0.25/0.45
0.18/0.30
0.14/0.24
0.23/0.39
SIT[20]
2022
0.42/0.60
0.21/0.37
0.51/0.94
0.20/0.34
0.17/0.30
0.30/0.51
Social-STGCNN+NPSN[11]
2022
0.44/0.65
0.21/0.34
0.27/0.44
0.24/0.43
0.21/0.37
0.28/0.44
SGCN+NPSN[11]
2022
0.35/0.58
0.15/0.25
0.22/0.39
0.18/0.31
0.13/0.24
0.21/0.36
Graph-TERN[21]
2023
0.42/0.58
0.14/0.23
0.26/0.45
0.21/0.37
0.17/0.29
0.24/0.38
本研究模型
—
0.37/0.60
0.17/0.30
0.23/0.39
0.19/0.33
0.14/0.26
0.22/0.38
Tab.1Comparison of results (ADE/FDE) on ETH and UCY datasets
模型
年份
ADE
FDE
STGAT[17]
2019
18.80
31.30
Social-STGCNN[7]
2020
20.76
33.18
PECNET[15]
2020
9.96
15.88
SGCN[8]
2021
11.67
19.10
Social-STGCNN+NPSN
2022
11.80
18.43
SGCN+NPSN[11]
2022
17.12
28.97
Graph-TERN[21]
2023
8.43
14.26
本研究模型
—
9.16
15.21
Tab.2Comparison of results on SDD dataset
模型
M/103
t/s
PITF[16]
360.0
0.1145
PECNET[15]
21.0
0.1376
Social-STGCNN[7]
7.6
0.0020
SGCN[8]
25.0
0.1146
SGCN+NPSN[11]
30.4
0.2349
Graph-TERN[21]
48.5
0.0945
TAtt+SAtt
1.1
—
STGCN+6层TXPCNN
7.7
—
Sampling
5.1
—
本研究模型
13.9
0.0879
Tab.3Comparison of model parameters and inference time
组件
变体
ADE/FDE
SDD
ETH
HOTEL
UNIV
ZARA1
ZARA2
平均值
Attention
w/o
0.43/0.70
0.21/0.36
0.28/0.46
0.24/0.42
0.20/0.35
0.27/0.45
9.28/15.35
SAtt
0.40/0.68
0.19/0.35
0.23/0.40
0.19/0.35
0.14/0.27
0.23/0.41
9.20/15.28
TAtt
0.37/0.63
0.19/0.33
0.22/0.39
0.19/0.34
0.14/0.27
0.24/0.39
9.18/15.25
TAtt+SAtt
0.37/0.60
0.17/0.30
0.23/0.39
0.19/0.33
0.14/0.26
0.22/0.38
9.16/15.21
WeightA
w/o
0.41/0.67
0.21/0.37
0.25/0.43
0.24/0.38
0.18/0.29
0.26/0.42
9.23/15.30
${{A}_{{L_2}}}$
0.39/0.65
0.19/0.36
0.25/0.41
0.21/0.36
0.15/0.27
0.24/0.41
9.18/15.25
${\underline { A_t }} $
0.37/0.60
0.17/0.30
0.23/0.39
0.19/0.33
0.14/0.26
0.22/0.38
9.16/15.21
Sampling
random
0.62/1.10
0.42/0.64
0.47/0.85
0.34/0.50
0.30/0.49
0.43/0.71
9.30/15.58
$\underline{{\rm{purpose}}}$
0.37/0.60
0.17/0.30
0.23/0.39
0.19/0.33
0.14/0.26
0.22/0.38
9.16/15.21
Multi-head
w/o
0.41/0.67
0.17/0.30
0.23/0.39
0.19/0.34
0.14/0.25
0.23/0.39
9.20/15.25
2
0.36/0.60
0.18/0.33
0.23/0.39
0.19/0.35
0.14/0.26
0.22/0.39
9.18/15.24
$\underline 4$
0.37/0.60
0.17/0.30
0.23/0.39
0.19/0.33
0.14/0.26
0.22/0.38
9.16/15.21
6
0.44/0.76
0.17/0.30
0.23/0.41
0.19/0.35
0.15/0.28
0.24/0.42
9.19/15.23
8
0.39/0.63
0.16/0.28
0.23/0.40
0.19/0.35
0.15/0.27
0.22/0.39
9.17/15.23
Loss
${{L}_1}$
0.40/0.67
0.19/0.37
0.28/0.40
0.21/0.37
0.20/0.30
0.25/0.42
9.18/15.27
${{L}_2}$
0.39/0.65
0.20/0.36
0.24/0.43
0.23/0.35
0.17/0.27
0.24/0.41
9.17/15.25
$ \underline{ {{L}}_1 + {{L}}_2 } $
0.37/0.60
0.17/0.30
0.23/0.39
0.19/0.33
0.14/0.26
0.22/0.38
9.16/15.21
Tab.4Ablation experiment results (ADE/FDE) of propsed model in different components
Fig.4Convergence curve comparison of different loss functions in ablation experiments
Fig.5Trajectory visualization
[1]
LUO Y, CAI P, BERA A, et al Porca: modeling and planning for autonomous driving among many pedestrians[J]. IEEE Robotics and Automation Letters, 2018, 3 (4): 3418- 3425
doi: 10.1109/LRA.2018.2852793
[2]
RUDENKO A, PALMIERI L, HERMAN M, et al Human motion trajectory prediction: a survey[J]. The International Journal of Robotics Research, 2020, 39 (8): 895- 935
doi: 10.1177/0278364920917446
[3]
ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: human trajectory prediction in crowded spaces [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE , 2016: 961–971.
[4]
XUE H, HUYNH D Q, REYNOLDS M. SS-LSTM: a hierarchical LSTM model for pedestrian trajectory prediction [C]// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) . Lake Tahoe: IEEE, 2018: 1186–1194.
[5]
ZHANG P, OUYANG W L, ZHANG P F, et al. SR-LSTM: state refinement for LSTM towards pedestrian trajectory prediction [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 12085–12094.
[6]
孔玮, 刘云, 李辉, 等 基于图卷积网络的行为识别方法综述[J]. 控制与决策, 2021, 36 (7): 1537- 1546 KONG Wei, LIU Yun, LI Hui, et al A survey of action recognition methods based on graph convolutional network[J]. Control and Decision, 2021, 36 (7): 1537- 1546
[7]
MOHAMED A, QIAN K, ELHOSEINY M, et al. Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 14424–14432.
[8]
SHI L, WANG L, LONG C, et al. SGCN: sparse graph convolution network for pedestrian trajectory prediction [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 8994–9003.
[9]
WU Z, PAN S, CHEN F, et al A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32 (1): 4- 24
[10]
GUPTA A, JOHNSON J, FEI-FEI L, et al. Social GAN: socially acceptable trajectories with generative adversarial networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake: IEEE, 2018: 2255–2264.
[11]
BAE I, PARK J H, JEON H G. Non-probability sampling network for stochastic human trajectory prediction [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 6477–6487.
[12]
MA Y J, INALA J P, JAYARAMAN D, et al. Likelihood-based diverse sampling for trajectory forecasting [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 13279–13288.
[13]
VEMULA A, MUELLING K, OH J. Social attention: modeling attention in human crowds [C]// 2018 IEEE International Conference on Robotics and Automation . Brisbane: IEEE, 2018: 4601–4607.
[14]
KOSARAJU V, SADEGHIAN A, MARTÍN-MARTÍN R, et al. Social-bigat: multimodal trajectory forecasting using bicycle-gan and graph attention networks [C]// Proceedings of the Annual Conference on Neural Information Processing Systems . Vancouver: NeurIPS, 2019: 1–10.
[15]
MANGALAM K, GIRASE H, AGARWAL S, et al. It is not the journey but the destination: endpoint conditioned trajectory prediction [C]// Computer Vision–ECCV 2020: 16th European Conference . Glasgow: Springer International Publishing, 2020: 759–776.
[16]
LIANG J, JIANG L, NIEBLES J C, et al. Peeking into the future: predicting future person activities and locations in videos [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 5725–5734.
[17]
HUANG Y, BI H, LI Z, et al. Stgat: modeling spatial-temporal interactions for human trajectory prediction [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 6272–6281.
[18]
YU C, MA X, REN J, et al. Spatio-temporal graph transformer networks for pedestrian trajectory prediction [C]// Computer Vision-ECCV 2020: 16th European Conference . Glasgow: Springer International Publishing, 2020: 507–523.
[19]
YUAN Y, WENG X, OU Y, et al. Agentformer: agent-aware transformers for socio-temporal multi-agent forecasting [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 9813–9823.
[20]
SHI L, WANG L, LONG C, et al. Social interpretable tree for pedestrian trajectory prediction [C]// Proceedings of the AAAI Conference on Artificial Intelligence . [s.l.]: AAAI, 2022, 36(2): 2235–2243.