Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (12): 2586-2595    DOI: 10.3785/j.issn.1008-973X.2024.12.018
    
Pedestrian trajectory prediction based on dual-attention spatial-temporal graph convolutional network
Xiaoqian XIANG1(),Jing CHEN1,2,*()
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computing Intelligence, Jiangnan University, Wuxi 214122, China
Download: HTML     PDF(3819KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

There are two major challenges in current research on pedestrian trajectory prediction: 1) how to effectively extract the spatial-temporal correlation between the front and back frames of pedestrians; 2) how to avoid performance degradation due to the influence of sampling bias in the trajectory sampling process. In response to the above two problems, a pedestrian trajectory prediction model was proposed based on the dual-attention spatial-temporal graph convolutional network and the purposive sampling network. Temporal attention was utilized to capture the correlation between the front and back frames, and spatial attention was utilized to capture the correlation between the surrounding pedestrians. Subsequently, the spatial-temporal correlations between pedestrians were further extracted by spatial-temporal graph convolution. Meanwhile, a learnable sampling network was introduced to resolve the problem of uneven distribution caused by random sampling. Extensive experiments showed that the accuracy of this method was comparable to that of the current state-of-the-art methods on the ETH and UCY datasets, but the number of model parameters and the inference time were reduced by 1.65×104 and 0.147 s, respectively; while the accuracy on the SDD dataset slightly decreased, but the amount of model parameters was reduced by 3.46×104, which showing a good performance balance. The proposed model can provide a new effective way for pedestrian trajectory prediction.



Key wordstrajectory prediction      deep learning      graph convolutional network      spatial-temporal graph convolution      temporal attention      spatial attention      trajectory sampling     
Received: 19 December 2023      Published: 25 November 2024
CLC:  TP 391  
Fund:  江苏省青年科学基金资助项目(BK20150159).
Corresponding Authors: Jing CHEN     E-mail: 6213113031@stu.jiangnan.edu.cn;chenjing@jiangnan.edu.cn
Cite this article:

Xiaoqian XIANG,Jing CHEN. Pedestrian trajectory prediction based on dual-attention spatial-temporal graph convolutional network. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2586-2595.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.12.018     OR     https://www.zjujournals.com/eng/Y2024/V58/I12/2586


基于双重注意力时空图卷积网络的行人轨迹预测

当前行人轨迹预测研究面临两大挑战:1)如何有效提取行人前后帧之间的时空相关性;2)如何避免在轨迹采样过程中受到采样偏差的影响而导致性能下降. 针对以上问题,提出基于双重注意力时空图卷积网络与目的抽样网络的行人轨迹预测模型. 利用时间注意力捕获行人前后帧的关联性,利用空间注意力获取周围行人之间的相关性,通过时空图卷积进一步提取行人之间的时空相关性. 引入可学习的抽样网络解决随机抽样导致的分布不均匀的问题. 大量实验表明,在ETH和UCY数据集上,新方法的精度与当前最先进的方法相当,且模型参数量减少1.65×104,推理时间缩短0.147 s;在SDD数据集上精度虽略有下降,但模型参数量减少了3.46×104,展现出良好的性能平衡,能为行人轨迹预测提供新的有效途径.


关键词: 轨迹预测,  深度学习,  图卷积网络,  时空图卷积,  时间注意力,  空间注意力,  轨迹采样 
Fig.1 Overall architecture of model
Fig.2 Convolutional structure chart of spatial-temporal graph
Fig.3 Structure chart of graph attention network
模型年份ADE/FDE
ETHHOTELUNIVZARA1ZARA2平均值
PITF[16]20190.73/1.650.30/0.590.60/1.270.38/0.810.31/0.680.46/1.00
STGAT[17]20190.50/0.840.26/0.460.51/1.070.33/0.640.30/0.610.38/0.72
BIGAT[14]20190.69/1.290.49/1.010.55/1.320.30/0.620.36/0.750.48/1.00
Social-STGCNN[7]20200.64/1.110.49/0.850.44/0.790.34/0.530.30/0.480.44/0.75
PECNET[15]20200.54/0.870.18/0.240.35/0.600.22/0.390.17/0.300.29/0.48
STAR[18]20200.36/0.650.17/0.360.31/0.620.26/0.550.22/0.460.26/0.53
SGCN[8]20210.63/1.030.32/0.550.37/0.700.29/0.530.25/0.450.37/0.65
AGENTFORMER[19]20210.45/0.750.14/0.220.25/0.450.18/0.300.14/0.240.23/0.39
SIT[20]20220.42/0.600.21/0.370.51/0.940.20/0.340.17/0.300.30/0.51
Social-STGCNN+NPSN[11]20220.44/0.650.21/0.340.27/0.440.24/0.430.21/0.370.28/0.44
SGCN+NPSN[11]20220.35/0.580.15/0.250.22/0.390.18/0.310.13/0.240.21/0.36
Graph-TERN[21]20230.42/0.580.14/0.230.26/0.450.21/0.370.17/0.290.24/0.38
本研究模型0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.38
Tab.1 Comparison of results (ADE/FDE) on ETH and UCY datasets
模型年份ADEFDE
STGAT[17]201918.8031.30
Social-STGCNN[7]202020.7633.18
PECNET[15]20209.9615.88
SGCN[8]202111.6719.10
Social-STGCNN+NPSN202211.8018.43
SGCN+NPSN[11]202217.1228.97
Graph-TERN[21]20238.4314.26
本研究模型9.1615.21
Tab.2 Comparison of results on SDD dataset
模型M/103t/s
PITF[16]360.00.1145
PECNET[15]21.00.1376
Social-STGCNN[7]7.60.0020
SGCN[8]25.00.1146
SGCN+NPSN[11]30.40.2349
Graph-TERN[21]48.50.0945
TAtt+SAtt1.1
STGCN+6层TXPCNN7.7
Sampling5.1
本研究模型13.90.0879
Tab.3 Comparison of model parameters and inference time
组件变体ADE/FDESDD
ETHHOTELUNIVZARA1ZARA2平均值
Attentionw/o0.43/0.700.21/0.360.28/0.460.24/0.420.20/0.350.27/0.459.28/15.35
SAtt0.40/0.680.19/0.350.23/0.400.19/0.350.14/0.270.23/0.419.20/15.28
TAtt0.37/0.630.19/0.330.22/0.390.19/0.340.14/0.270.24/0.399.18/15.25
TAtt+SAtt0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.389.16/15.21
WeightAw/o0.41/0.670.21/0.370.25/0.430.24/0.380.18/0.290.26/0.429.23/15.30
${{A}_{{L_2}}}$0.39/0.650.19/0.360.25/0.410.21/0.360.15/0.270.24/0.419.18/15.25
${\underline { A_t }} $0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.389.16/15.21
Samplingrandom0.62/1.100.42/0.640.47/0.850.34/0.500.30/0.490.43/0.719.30/15.58
$\underline{{\rm{purpose}}}$0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.389.16/15.21
Multi-headw/o0.41/0.670.17/0.300.23/0.390.19/0.340.14/0.250.23/0.399.20/15.25
20.36/0.600.18/0.330.23/0.390.19/0.350.14/0.260.22/0.399.18/15.24
$\underline 4$0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.389.16/15.21
60.44/0.760.17/0.300.23/0.410.19/0.350.15/0.280.24/0.429.19/15.23
80.39/0.630.16/0.280.23/0.400.19/0.350.15/0.270.22/0.399.17/15.23
Loss${{L}_1}$0.40/0.670.19/0.370.28/0.400.21/0.370.20/0.300.25/0.429.18/15.27
${{L}_2}$0.39/0.650.20/0.360.24/0.430.23/0.350.17/0.270.24/0.419.17/15.25
$ \underline{ {{L}}_1 + {{L}}_2 } $0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.389.16/15.21
Tab.4 Ablation experiment results (ADE/FDE) of propsed model in different components
Fig.4 Convergence curve comparison of different loss functions in ablation experiments
Fig.5 Trajectory visualization
[1]   LUO Y, CAI P, BERA A, et al Porca: modeling and planning for autonomous driving among many pedestrians[J]. IEEE Robotics and Automation Letters, 2018, 3 (4): 3418- 3425
doi: 10.1109/LRA.2018.2852793
[2]   RUDENKO A, PALMIERI L, HERMAN M, et al Human motion trajectory prediction: a survey[J]. The International Journal of Robotics Research, 2020, 39 (8): 895- 935
doi: 10.1177/0278364920917446
[3]   ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: human trajectory prediction in crowded spaces [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE , 2016: 961–971.
[4]   XUE H, HUYNH D Q, REYNOLDS M. SS-LSTM: a hierarchical LSTM model for pedestrian trajectory prediction [C]// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) . Lake Tahoe: IEEE, 2018: 1186–1194.
[5]   ZHANG P, OUYANG W L, ZHANG P F, et al. SR-LSTM: state refinement for LSTM towards pedestrian trajectory prediction [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 12085–12094.
[6]   孔玮, 刘云, 李辉, 等 基于图卷积网络的行为识别方法综述[J]. 控制与决策, 2021, 36 (7): 1537- 1546
KONG Wei, LIU Yun, LI Hui, et al A survey of action recognition methods based on graph convolutional network[J]. Control and Decision, 2021, 36 (7): 1537- 1546
[7]   MOHAMED A, QIAN K, ELHOSEINY M, et al. Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 14424–14432.
[8]   SHI L, WANG L, LONG C, et al. SGCN: sparse graph convolution network for pedestrian trajectory prediction [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 8994–9003.
[9]   WU Z, PAN S, CHEN F, et al A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32 (1): 4- 24
[10]   GUPTA A, JOHNSON J, FEI-FEI L, et al. Social GAN: socially acceptable trajectories with generative adversarial networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake: IEEE, 2018: 2255–2264.
[11]   BAE I, PARK J H, JEON H G. Non-probability sampling network for stochastic human trajectory prediction [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 6477–6487.
[12]   MA Y J, INALA J P, JAYARAMAN D, et al. Likelihood-based diverse sampling for trajectory forecasting [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 13279–13288.
[13]   VEMULA A, MUELLING K, OH J. Social attention: modeling attention in human crowds [C]// 2018 IEEE International Conference on Robotics and Automation . Brisbane: IEEE, 2018: 4601–4607.
[14]   KOSARAJU V, SADEGHIAN A, MARTÍN-MARTÍN R, et al. Social-bigat: multimodal trajectory forecasting using bicycle-gan and graph attention networks [C]// Proceedings of the Annual Conference on Neural Information Processing Systems . Vancouver: NeurIPS, 2019: 1–10.
[15]   MANGALAM K, GIRASE H, AGARWAL S, et al. It is not the journey but the destination: endpoint conditioned trajectory prediction [C]// Computer Vision–ECCV 2020: 16th European Conference . Glasgow: Springer International Publishing, 2020: 759–776.
[16]   LIANG J, JIANG L, NIEBLES J C, et al. Peeking into the future: predicting future person activities and locations in videos [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 5725–5734.
[17]   HUANG Y, BI H, LI Z, et al. Stgat: modeling spatial-temporal interactions for human trajectory prediction [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 6272–6281.
[18]   YU C, MA X, REN J, et al. Spatio-temporal graph transformer networks for pedestrian trajectory prediction [C]// Computer Vision-ECCV 2020: 16th European Conference . Glasgow: Springer International Publishing, 2020: 507–523.
[19]   YUAN Y, WENG X, OU Y, et al. Agentformer: agent-aware transformers for socio-temporal multi-agent forecasting [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 9813–9823.
[20]   SHI L, WANG L, LONG C, et al. Social interpretable tree for pedestrian trajectory prediction [C]// Proceedings of the AAAI Conference on Artificial Intelligence . [s.l.]: AAAI, 2022, 36(2): 2235–2243.
[1] Fan LI,Jie YANG,Zhicheng FENG,Zhichao CHEN,Yunxiao FU. Pantograph-catenary contact point detection method based on image recognition[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1801-1810.
[2] Li XIAO,Zhigang CAO,Haoran LU,Zhijian HUANG,Yuanqiang CAI. Elastic metamaterial design based on deep learning and gradient optimization[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1892-1901.
[3] Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU. Attention-fused filter bank dual-view graph convolution motor imagery EEG classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1326-1335.
[4] Linrui LI,Dongsheng WANG,Hongjie FAN. Fact-based similar case retrieval methods based on statutory knowledge[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1357-1365.
[5] Xianwei MA,Chaohui FAN,Weizhi NIE,Dong LI,Yiqun ZHU. Robust fault diagnosis method for failure sensors[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1488-1497.
[6] Juan SONG,Longxi HE,Huiping LONG. Deep learning-based algorithm for multi defect detection in tunnel lining[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1161-1173.
[7] Bo ZHONG,Pengfei WANG,Yiqiao WANG,Xiaoling WANG. Survey of deep learning based EEG data analysis technology[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 879-890.
[8] Cuiting WEI,Weijian ZHAO,Bochao SUN,Yunyi LIU. Intelligent rebar inspection based on improved Mask R-CNN and stereo vision[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 1009-1019.
[9] Hai HUAN,Yu SHENG,Chenxi GU. Global guidance multi-feature fusion network based on remote sensing image road extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 696-707.
[10] Xianglong LUO,Yafei WANG,Yanbo WANG,Lixin WANG. Structural deformation prediction of monitoring data based on bi-directional gate board learning system[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 729-736.
[11] Mingjun SONG,Wen YAN,Yizhao DENG,Junran ZHANG,Haiyan TU. Light-weight algorithm for real-time robotic grasp detection[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 599-610.
[12] Qingjie QIAN,Junhe YU,Hongfei ZHAN,Rui WANG,Jian HU. Dimension prediction method of injection molded parts based on multi-feature fusion of DL-BiGRU[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 646-654.
[13] Xinhua YAO,Tao YU,Senwen FENG,Zijian MA,Congcong LUAN,Hongyao SHEN. Recognition method of parts machining features based on graph neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 349-359.
[14] Yuebo MENG,Bo WANG,Guanghui LIU. Multi-scale context-guided feature elimination for ancient tower image classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2489-2499.
[15] Yifan ZHOU,Lingwei ZHANG,Zhengdong ZHOU,Zhi CAI,Mengyao YUAN,Xiaoxi YUAN,Zeyi YANG. Classification of group speech imagined EEG signals based on attention mechanism and deep learning[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2540-2546.