Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (12): 2586-2595    DOI: 10.3785/j.issn.1008-973X.2024.12.018
交通工程     
基于双重注意力时空图卷积网络的行人轨迹预测
向晓倩1(),陈璟1,2,*()
1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
2. 江南大学 江苏省模式识别与计算智能工程实验室,江苏 无锡 214122
Pedestrian trajectory prediction based on dual-attention spatial-temporal graph convolutional network
Xiaoqian XIANG1(),Jing CHEN1,2,*()
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computing Intelligence, Jiangnan University, Wuxi 214122, China
 全文: PDF(3819 KB)   HTML
摘要:

当前行人轨迹预测研究面临两大挑战:1)如何有效提取行人前后帧之间的时空相关性;2)如何避免在轨迹采样过程中受到采样偏差的影响而导致性能下降. 针对以上问题,提出基于双重注意力时空图卷积网络与目的抽样网络的行人轨迹预测模型. 利用时间注意力捕获行人前后帧的关联性,利用空间注意力获取周围行人之间的相关性,通过时空图卷积进一步提取行人之间的时空相关性. 引入可学习的抽样网络解决随机抽样导致的分布不均匀的问题. 大量实验表明,在ETH和UCY数据集上,新方法的精度与当前最先进的方法相当,且模型参数量减少1.65×104,推理时间缩短0.147 s;在SDD数据集上精度虽略有下降,但模型参数量减少了3.46×104,展现出良好的性能平衡,能为行人轨迹预测提供新的有效途径.

关键词: 轨迹预测深度学习图卷积网络时空图卷积时间注意力空间注意力轨迹采样    
Abstract:

There are two major challenges in current research on pedestrian trajectory prediction: 1) how to effectively extract the spatial-temporal correlation between the front and back frames of pedestrians; 2) how to avoid performance degradation due to the influence of sampling bias in the trajectory sampling process. In response to the above two problems, a pedestrian trajectory prediction model was proposed based on the dual-attention spatial-temporal graph convolutional network and the purposive sampling network. Temporal attention was utilized to capture the correlation between the front and back frames, and spatial attention was utilized to capture the correlation between the surrounding pedestrians. Subsequently, the spatial-temporal correlations between pedestrians were further extracted by spatial-temporal graph convolution. Meanwhile, a learnable sampling network was introduced to resolve the problem of uneven distribution caused by random sampling. Extensive experiments showed that the accuracy of this method was comparable to that of the current state-of-the-art methods on the ETH and UCY datasets, but the number of model parameters and the inference time were reduced by 1.65×104 and 0.147 s, respectively; while the accuracy on the SDD dataset slightly decreased, but the amount of model parameters was reduced by 3.46×104, which showing a good performance balance. The proposed model can provide a new effective way for pedestrian trajectory prediction.

Key words: trajectory prediction    deep learning    graph convolutional network    spatial-temporal graph convolution    temporal attention    spatial attention    trajectory sampling
收稿日期: 2023-12-19 出版日期: 2024-11-25
CLC:  TP 391  
基金资助: 江苏省青年科学基金资助项目(BK20150159).
通讯作者: 陈璟     E-mail: 6213113031@stu.jiangnan.edu.cn;chenjing@jiangnan.edu.cn
作者简介: 向晓倩(1998—),女,硕士生,从事行人轨迹预测研究. orcid.org/0009-0001-5551-8240. E-mail:6213113031@stu.jiangnan.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
向晓倩
陈璟

引用本文:

向晓倩,陈璟. 基于双重注意力时空图卷积网络的行人轨迹预测[J]. 浙江大学学报(工学版), 2024, 58(12): 2586-2595.

Xiaoqian XIANG,Jing CHEN. Pedestrian trajectory prediction based on dual-attention spatial-temporal graph convolutional network. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2586-2595.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.12.018        https://www.zjujournals.com/eng/CN/Y2024/V58/I12/2586

图 1  模型总体架构
图 2  时空图卷积结构图
图 3  图注意力网络结构图
模型年份ADE/FDE
ETHHOTELUNIVZARA1ZARA2平均值
PITF[16]20190.73/1.650.30/0.590.60/1.270.38/0.810.31/0.680.46/1.00
STGAT[17]20190.50/0.840.26/0.460.51/1.070.33/0.640.30/0.610.38/0.72
BIGAT[14]20190.69/1.290.49/1.010.55/1.320.30/0.620.36/0.750.48/1.00
Social-STGCNN[7]20200.64/1.110.49/0.850.44/0.790.34/0.530.30/0.480.44/0.75
PECNET[15]20200.54/0.870.18/0.240.35/0.600.22/0.390.17/0.300.29/0.48
STAR[18]20200.36/0.650.17/0.360.31/0.620.26/0.550.22/0.460.26/0.53
SGCN[8]20210.63/1.030.32/0.550.37/0.700.29/0.530.25/0.450.37/0.65
AGENTFORMER[19]20210.45/0.750.14/0.220.25/0.450.18/0.300.14/0.240.23/0.39
SIT[20]20220.42/0.600.21/0.370.51/0.940.20/0.340.17/0.300.30/0.51
Social-STGCNN+NPSN[11]20220.44/0.650.21/0.340.27/0.440.24/0.430.21/0.370.28/0.44
SGCN+NPSN[11]20220.35/0.580.15/0.250.22/0.390.18/0.310.13/0.240.21/0.36
Graph-TERN[21]20230.42/0.580.14/0.230.26/0.450.21/0.370.17/0.290.24/0.38
本研究模型0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.38
表 1  ETH和UCY数据集上的结果(ADE/FDE)对比表
模型年份ADEFDE
STGAT[17]201918.8031.30
Social-STGCNN[7]202020.7633.18
PECNET[15]20209.9615.88
SGCN[8]202111.6719.10
Social-STGCNN+NPSN202211.8018.43
SGCN+NPSN[11]202217.1228.97
Graph-TERN[21]20238.4314.26
本研究模型9.1615.21
表 2  SDD数据集上的结果对比
模型M/103t/s
PITF[16]360.00.1145
PECNET[15]21.00.1376
Social-STGCNN[7]7.60.0020
SGCN[8]25.00.1146
SGCN+NPSN[11]30.40.2349
Graph-TERN[21]48.50.0945
TAtt+SAtt1.1
STGCN+6层TXPCNN7.7
Sampling5.1
本研究模型13.90.0879
表 3  模型参数和推理时间对比表
组件变体ADE/FDESDD
ETHHOTELUNIVZARA1ZARA2平均值
Attentionw/o0.43/0.700.21/0.360.28/0.460.24/0.420.20/0.350.27/0.459.28/15.35
SAtt0.40/0.680.19/0.350.23/0.400.19/0.350.14/0.270.23/0.419.20/15.28
TAtt0.37/0.630.19/0.330.22/0.390.19/0.340.14/0.270.24/0.399.18/15.25
TAtt+SAtt0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.389.16/15.21
WeightAw/o0.41/0.670.21/0.370.25/0.430.24/0.380.18/0.290.26/0.429.23/15.30
${{A}_{{L_2}}}$0.39/0.650.19/0.360.25/0.410.21/0.360.15/0.270.24/0.419.18/15.25
${\underline { A_t }} $0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.389.16/15.21
Samplingrandom0.62/1.100.42/0.640.47/0.850.34/0.500.30/0.490.43/0.719.30/15.58
$\underline{{\rm{purpose}}}$0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.389.16/15.21
Multi-headw/o0.41/0.670.17/0.300.23/0.390.19/0.340.14/0.250.23/0.399.20/15.25
20.36/0.600.18/0.330.23/0.390.19/0.350.14/0.260.22/0.399.18/15.24
$\underline 4$0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.389.16/15.21
60.44/0.760.17/0.300.23/0.410.19/0.350.15/0.280.24/0.429.19/15.23
80.39/0.630.16/0.280.23/0.400.19/0.350.15/0.270.22/0.399.17/15.23
Loss${{L}_1}$0.40/0.670.19/0.370.28/0.400.21/0.370.20/0.300.25/0.429.18/15.27
${{L}_2}$0.39/0.650.20/0.360.24/0.430.23/0.350.17/0.270.24/0.419.17/15.25
$ \underline{ {{L}}_1 + {{L}}_2 } $0.37/0.600.17/0.300.23/0.390.19/0.330.14/0.260.22/0.389.16/15.21
表 4  本研究模型在不同组件下的消融实验结果(ADE/FDE)
图 4  不同损失函数在消融实验中的收敛曲线比较
图 5  轨迹可视化
1 LUO Y, CAI P, BERA A, et al Porca: modeling and planning for autonomous driving among many pedestrians[J]. IEEE Robotics and Automation Letters, 2018, 3 (4): 3418- 3425
doi: 10.1109/LRA.2018.2852793
2 RUDENKO A, PALMIERI L, HERMAN M, et al Human motion trajectory prediction: a survey[J]. The International Journal of Robotics Research, 2020, 39 (8): 895- 935
doi: 10.1177/0278364920917446
3 ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: human trajectory prediction in crowded spaces [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE , 2016: 961–971.
4 XUE H, HUYNH D Q, REYNOLDS M. SS-LSTM: a hierarchical LSTM model for pedestrian trajectory prediction [C]// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) . Lake Tahoe: IEEE, 2018: 1186–1194.
5 ZHANG P, OUYANG W L, ZHANG P F, et al. SR-LSTM: state refinement for LSTM towards pedestrian trajectory prediction [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 12085–12094.
6 孔玮, 刘云, 李辉, 等 基于图卷积网络的行为识别方法综述[J]. 控制与决策, 2021, 36 (7): 1537- 1546
KONG Wei, LIU Yun, LI Hui, et al A survey of action recognition methods based on graph convolutional network[J]. Control and Decision, 2021, 36 (7): 1537- 1546
7 MOHAMED A, QIAN K, ELHOSEINY M, et al. Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 14424–14432.
8 SHI L, WANG L, LONG C, et al. SGCN: sparse graph convolution network for pedestrian trajectory prediction [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 8994–9003.
9 WU Z, PAN S, CHEN F, et al A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32 (1): 4- 24
10 GUPTA A, JOHNSON J, FEI-FEI L, et al. Social GAN: socially acceptable trajectories with generative adversarial networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake: IEEE, 2018: 2255–2264.
11 BAE I, PARK J H, JEON H G. Non-probability sampling network for stochastic human trajectory prediction [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 6477–6487.
12 MA Y J, INALA J P, JAYARAMAN D, et al. Likelihood-based diverse sampling for trajectory forecasting [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 13279–13288.
13 VEMULA A, MUELLING K, OH J. Social attention: modeling attention in human crowds [C]// 2018 IEEE International Conference on Robotics and Automation . Brisbane: IEEE, 2018: 4601–4607.
14 KOSARAJU V, SADEGHIAN A, MARTÍN-MARTÍN R, et al. Social-bigat: multimodal trajectory forecasting using bicycle-gan and graph attention networks [C]// Proceedings of the Annual Conference on Neural Information Processing Systems . Vancouver: NeurIPS, 2019: 1–10.
15 MANGALAM K, GIRASE H, AGARWAL S, et al. It is not the journey but the destination: endpoint conditioned trajectory prediction [C]// Computer Vision–ECCV 2020: 16th European Conference . Glasgow: Springer International Publishing, 2020: 759–776.
16 LIANG J, JIANG L, NIEBLES J C, et al. Peeking into the future: predicting future person activities and locations in videos [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 5725–5734.
17 HUANG Y, BI H, LI Z, et al. Stgat: modeling spatial-temporal interactions for human trajectory prediction [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 6272–6281.
18 YU C, MA X, REN J, et al. Spatio-temporal graph transformer networks for pedestrian trajectory prediction [C]// Computer Vision-ECCV 2020: 16th European Conference . Glasgow: Springer International Publishing, 2020: 507–523.
19 YUAN Y, WENG X, OU Y, et al. Agentformer: agent-aware transformers for socio-temporal multi-agent forecasting [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 9813–9823.
20 SHI L, WANG L, LONG C, et al. Social interpretable tree for pedestrian trajectory prediction [C]// Proceedings of the AAAI Conference on Artificial Intelligence . [s.l.]: AAAI, 2022, 36(2): 2235–2243.
[1] 李凡,杨杰,冯志成,陈智超,付云骁. 基于图像识别的弓网接触点检测方法[J]. 浙江大学学报(工学版), 2024, 58(9): 1801-1810.
[2] 肖力,曹志刚,卢浩冉,黄志坚,蔡袁强. 基于深度学习和梯度优化的弹性超材料设计[J]. 浙江大学学报(工学版), 2024, 58(9): 1892-1901.
[3] 吴书晗,王丹,陈远方,贾子钰,张越棋,许萌. 融合注意力的滤波器组双视图图卷积运动想象脑电分类[J]. 浙江大学学报(工学版), 2024, 58(7): 1326-1335.
[4] 李林睿,王东升,范红杰. 基于法条知识的事理型类案检索方法[J]. 浙江大学学报(工学版), 2024, 58(7): 1357-1365.
[5] 马现伟,范朝辉,聂为之,李东,朱逸群. 对失效传感器具备鲁棒性的故障诊断方法[J]. 浙江大学学报(工学版), 2024, 58(7): 1488-1497.
[6] 宋娟,贺龙喜,龙会平. 基于深度学习的隧道衬砌多病害检测算法[J]. 浙江大学学报(工学版), 2024, 58(6): 1161-1173.
[7] 钟博,王鹏飞,王乙乔,王晓玲. 基于深度学习的EEG数据分析技术综述[J]. 浙江大学学报(工学版), 2024, 58(5): 879-890.
[8] 魏翠婷,赵唯坚,孙博超,刘芸怡. 基于改进Mask R-CNN与双目视觉的智能配筋检测[J]. 浙江大学学报(工学版), 2024, 58(5): 1009-1019.
[9] 何勇禧,韩虎,孔博. 基于多依赖图和知识融合的方面级情感分析模型[J]. 浙江大学学报(工学版), 2024, 58(4): 737-747.
[10] 宦海,盛宇,顾晨曦. 基于遥感图像道路提取的全局指导多特征融合网络[J]. 浙江大学学报(工学版), 2024, 58(4): 696-707.
[11] 罗向龙,王亚飞,王彦博,王立新. 基于双向门控式宽度学习系统的监测数据结构变形预测[J]. 浙江大学学报(工学版), 2024, 58(4): 729-736.
[12] 宋明俊,严文,邓益昭,张俊然,涂海燕. 轻量化机器人抓取位姿实时检测算法[J]. 浙江大学学报(工学版), 2024, 58(3): 599-610.
[13] 钱庆杰,余军合,战洪飞,王瑞,胡健. 基于DL-BiGRU多特征融合的注塑件尺寸预测方法[J]. 浙江大学学报(工学版), 2024, 58(3): 646-654.
[14] 姚鑫骅,于涛,封森文,马梓健,栾丛丛,沈洪垚. 基于图神经网络的零件机加工特征识别方法[J]. 浙江大学学报(工学版), 2024, 58(2): 349-359.
[15] 孟月波,王博,刘光辉. 多尺度上下文引导特征消除的古塔图像分类[J]. 浙江大学学报(工学版), 2024, 58(12): 2489-2499.