基于跨任务双向特征交互的交通场景感知算法

doi:10.3785/j.issn.1008-973X.2025.09.002

浙江大学学报(工学版)

2025, Vol. 59

Issue (9): 1784-1792 DOI: 10.3785/j.issn.1008-973X.2025.09.002

计算机技术

基于跨任务双向特征交互的交通场景感知算法

林鹏志1(

),钟铭恩1,*(

),范康2,谭佳威2,林志强1

1. 厦门理工学院机械与汽车工程学院，福建厦门 361024
2. 厦门大学航空航天学院，福建厦门 361005

Traffic scene perception algorithm based on cross-task bidirectional feature interaction

Pengzhi LIN1(

),Ming’en ZHONG1,*(

),Kang FAN2,Jiawei TAN2,Zhiqiang LIN1

1. School of Mechanical and Automotive Engineering, Xiamen University of Technology, Xiamen 361024, China
2. School of Aerospace Engineering, Xiamen University, Xiamen 361005, China

全文: PDF(2358 KB) HTML

摘要：

为了提高交通场景感知算法的整体性能，利用语义分割任务和深度估计任务之间的显式和隐式相关性，依据跨任务双向特征交互原理，提出面向城市街道自动驾驶的感知算法SDFormer++. 在跨任务特征提取阶段加入交互门控线性单元，形成高质量的特定任务特征表达；构建多任务特征交互模块，应用双向注意力机制，借助跨域共享任务的特征信息来增强初始特定任务特征；设计多尺度特征融合模块，整合不同层次的信息，以获取精细的高分辨率特征. 在Cityscapes数据集上的实验结果表明，算法的像素分割平均交并比mIoU为82.4%，深度估计平均平方根误差RMSE和绝对相对误差ARE分别为4.453和0.130，针对5类典型交通参与者的平均距离估计误差为6.0%，均超越InvPT++、SDFormer等主流多任务算法.

关键词： 跨任务交互; 多任务学习; 交通环境感知; 语义分割; 深度估计

Abstract:

A traffic scene perception algorithm (SDFormer++) based on the principle of cross-task bidirectional feature interaction for autonomous driving in urban street scenarios was proposed by leveraging the explicit and implicit correlations between the semantic segmentation tasks and the depth estimation tasks to improve the overall performance of traffic scene perception algorithms. An interaction-gated linear unit was added into the cross-task feature extraction stage to form high-quality task-specific feature representations. A multi-task feature interaction module that used the bidirectional attention mechanism was constructed to enhance the initial task-specific features by utilizing the feature information of shared cross-domain tasks. A multi-scale feature fusion module was designed to integrate information at different levels to obtain fine high-resolution features. Experimental results on the Cityscapes dataset showed that the algorithm achieved a mean intersection over union (mIoU) of 82.4% for pixel segmentation, a root mean square error (RMSE) of 4.453 for depth estimation, an absolute relative error (ARE) of 0.130 for depth estimation, and an average distance estimation error of 6.0% for five typical traffic participants, all of which outperformed the existing mainstream multi-task algorithms such as InvPT++ and SDFormer.

Key words: cross-task interaction multi-task learning traffic environment perception semantic segmentation depth estimation

收稿日期: 2024-12-05 出版日期: 2025-08-25

CLC:

TP 391.4

基金资助: 福建省自然科学基金资助项目（2023J011439）.

通讯作者: 钟铭恩 E-mail: 2477541661@qq.com;zhongmingen@xmut.edu.cn

作者简介: 林鹏志（2000—），男，硕士生，从事机器视觉和智慧交通研究. orcid.org/0009-0005-8197-9429. E-mail：2477541661@qq.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	林鹏志
	钟铭恩
	范康
	谭佳威
	林志强

引用本文:

林鹏志,钟铭恩,范康,谭佳威,林志强. 基于跨任务双向特征交互的交通场景感知算法[J]. 浙江大学学报(工学版), 2025, 59(9): 1784-1792.

Pengzhi LIN,Ming’en ZHONG,Kang FAN,Jiawei TAN,Zhiqiang LIN. Traffic scene perception algorithm based on cross-task bidirectional feature interaction. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1784-1792.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.09.002 或 https://www.zjujournals.com/eng/CN/Y2025/V59/I9/1784

图 1 交通场景感知算法SDFormer++的整体结构

图 2 跨任务特征提取模块结构图

图 3 多任务特征交互模块结构图

图 4 多尺度特征融合模块结构图

表 1 不同网络组件的消融实验结果

图 5 不同特征提取模块的注意力模式可视化对比

表 2 多尺度特征融合模块消融实验结果

表 3 不同多任务算法的性能对比结果

图 6 SDFormer++、SDFormer与次优算法在语义分割任务上的推理效果对比

图 7 SDFormer++、SDFormer与次优算法在深度估计任务上的推理效果对比

表 4 SDFormer++与单任务语义分割算法的性能对比

表 5 SDFormer++与单任务深度估计算法的性能对比

表 6 不同交通参与者的距离估计误差对比

表 7 不同距离范围下的距离估计误差

图 8 典型交通参与者在不同光照和天气条件下的距离预测效果

1	金立生, 华强, 郭柏苍, 等基于优化DeepSort的前方车辆多目标跟踪[J]. 浙江大学学报: 工学版, 2021, 55 (6): 1056- 1064 JIN Lisheng, HUA Qiang, GUO Baicang, et al Multi-target tracking of vehicles based on optimized DeepSort[J]. Journal of Zhejiang University: Engineering Science, 2021, 55 (6): 1056- 1064
2	XIAO X, ZHAO Y, ZHANG F, et al BASeg: boundary aware semantic segmentation for autonomous driving[J]. Neural Networks, 2023, 157 (12): 460- 470
3	ABDIGAPPOROV S, MIRALIEV S, KAKANI V, et al Joint multiclass object detection and semantic segmentation for autonomous driving[J]. IEEE Access, 2023, 11: 37637- 37649 doi: 10.1109/ACCESS.2023.3266284
4	LV J, TONG H, PAN Q, et al. Importance-aware image segmentation-based semantic communication for autonomous driving [EB/OL]. (2024-01-06) [2024-12-05]. https://arxiv.org/pdf/2401.10153.
5	LAHIRI S, REN J, LIN X Deep learning-based stereopsis and monocular depth estimation techniques: a review[J]. Vehicles, 2024, 6 (1): 305- 351 doi: 10.3390/vehicles6010013
6	JUN W, YOO J, LEE S Synthetic data enhancement and network compression technology of monocular depth estimation for real-time autonomous driving system[J]. Sensors, 2024, 24 (13): 4205 doi: 10.3390/s24134205
7	RAJAPAKSHA U, SOHEL F, LAGA H, et al Deep learning-based depth estimation methods from monocular image and videos: a comprehensive survey[J]. ACM Computing Surveys, 2024, 56 (12): 1- 51
8	FENG Y, SUN X, DIAO W, et al Height aware understanding of remote sensing images based on cross-task interaction[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2023, 195 (4): 233- 249
9	SAMANT R M, BACHUTE M R, GITE S, et al Framework for deep learning-based language models using multi-task learning in natural language understanding: a systematic literature review and future directions[J]. IEEE Access, 2022, 10: 17078- 17097 doi: 10.1109/ACCESS.2022.3149798
10	ZHANG H, LIU H, KIM C Semantic and instance segmentation in coastal urban spatial perception: a multi-task learning framework with an attention mechanism[J]. Sustainability, 2024, 16 (2): 833 doi: 10.3390/su16020833
11	AGAND P, MAHDAVIAN M, SAVVA M, et al. LeTFuser: light-weight end-to-end Transformer-based sensor fusion for autonomous driving with multi-task learning [EB/OL]. (2023-10-19) [2024-12-05]. https://arxiv.org/pdf/2310.13135.
12	YAO J, LI Y, LIU C, et al EHSINet: efficient high-order spatial interaction multi-task network for adaptive autonomous driving perception[J]. Neural Processing Letters, 2023, 55 (8): 11353- 11370 doi: 10.1007/s11063-023-11379-x
13	TAN G, WANG C, LI Z, et al A multi-task network based on dual-neck structure for autonomous driving perception[J]. Sensors, 2024, 24 (5): 1547 doi: 10.3390/s24051547
14	WEI X, CHEN Y. Joint extraction of long-distance entity relation by aggregating local- and semantic-dependent features [J]. Wireless Communications and Mobile Computing, 2022: 3763940.
15	YE H, XU D InvPT++: inverted pyramid multi-task Transformer for visual scene understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46 (12): 7493- 7508 doi: 10.1109/TPAMI.2024.3397031
16	范康, 钟铭恩, 谭佳威, 等联合语义分割和深度估计的交通场景感知算法[J]. 浙江大学学报: 工学版, 2024, 58 (4): 684- 695 FAN Kang, ZHONG Ming’en, TAN Jiawei, et al Traffic scene perception algorithm with joint semantic segmentation and depth estimation[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (4): 684- 695
17	CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213–3223.
18	NISHI K, KIM J, LI W, et al. Joint-task regularization for partially labeled multi-task learning [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 16152–16162.
19	LI W, LIU X, BILEN H. Learning multiple dense prediction tasks from partially annotated data [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 18857–18867.
20	LOPES I, VU T H, CHARETTE R. Cross-task attention mechanism for dense multi-task learning [C]// IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 2328–2337.
21	TAGHAVI P, LANGARI R, PANDEY G. SwinMTL: a shared architecture for simultaneous depth estimation and semantic segmentation from monocular camera images [EB/OL]. (2024-03-15) [2024-12-05]. https://arxiv.org/abs/2403.10662.
22	QASHQAI D, MOUSAVIAN E, SHOKOUHI S B, et al. CSFNet: a cosine similarity fusion network for real-time RGB-X semantic segmentation of driving scenes [EB/OL]. (2024-07-01) [2024-12-05]. https://arxiv.org/pdf/2407.01328.
23	JEEVAN P, VISWABATHAN K, SETHI A. WaveMix: a resource-efficient neural network for image analysis [EB/OL]. (2024-03-28) [2024-12-05]. https://arxiv.org/pdf/2205.143755.
24	GUO Z, BIAN L, HUANG X, et al. DSNet: a novel way to use atrous convolutions in semantic segmentation [EB/OL]. (2024-06-06) [2024-12-05]. https://arxiv.org/pdf/2406.03702.
25	ZHANG J, LIU H, YANG K, et al CMX: cross-modal fusion for RGB-X semantic segmentation with Transformers[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24 (12): 14679- 14694 doi: 10.1109/TITS.2023.3300537
26	CAI H, LI J, HU M, et al. EfficientViT: multi-scale linear attention for high-resolution dense prediction [EB/OL]. (2024-02-06) [2024-12-05]. https://arxiv.org/pdf/2205.14756.
27	ZHOU K, BIAN J, XIE Q, et al. Manydepth2: motion-aware self-supervised multi-frame monocular depth estimation in dynamic scenes [EB/OL]. (2024-10-11) [2024-12-05]. https://arxiv.org/pdf/2312.15268v6.
28	LI Z, CHEN Z, LIU X, et al DepthFormer: exploiting long-range correlation and local information for accurate monocular depth estimation[J]. Machine Intelligence Research, 2023, 20 (6): 837- 854 doi: 10.1007/s11633-023-1458-0

[1]	魏新雨,饶蕾,范光宇,陈年生,程松林,杨定裕. 用于无人机遥感图像的高精度实时语义分割网络[J]. 浙江大学学报(工学版), 2025, 59(7): 1411-1420.
[2]	李沈崇,曾新华,林传渠. 基于轴向注意力的多任务自动驾驶环境感知算法[J]. 浙江大学学报(工学版), 2025, 59(4): 769-777.
[3]	顾正宇,赖菲菲,耿辰,王希明,戴亚康. 基于知识引导的缺血性脑卒中梗死区分割方法[J]. 浙江大学学报(工学版), 2025, 59(4): 814-820.
[4]	张振利,胡新凯,李凡,冯志成,陈智超. 基于CNN和Efficient Transformer的多尺度遥感图像语义分割算法[J]. 浙江大学学报(工学版), 2025, 59(4): 778-786.
[5]	李凡,杨杰,冯志成,陈智超,付云骁. 基于图像识别的弓网接触点检测方法[J]. 浙江大学学报(工学版), 2024, 58(9): 1801-1810.
[6]	杨军,张琛. 基于边界点估计与稀疏卷积神经网络的三维点云语义分割[J]. 浙江大学学报(工学版), 2024, 58(6): 1121-1132.
[7]	刘毅,陈一丹,高琳,洪姣. 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报(工学版), 2024, 58(5): 951-959.
[8]	范康,钟铭恩,谭佳威,詹泽辉,冯妍. 联合语义分割和深度估计的交通场景感知算法[J]. 浙江大学学报(工学版), 2024, 58(4): 684-695.
[9]	李灿林,张文娇,邵志文,马利庄,王新玥. 基于Trans-nightSeg的夜间道路场景语义分割方法[J]. 浙江大学学报(工学版), 2024, 58(2): 294-303.
[10]	梁龙学,贺成龙,吴小所,闫浩文. 全局信息提取与重建的遥感图像语义分割网络[J]. 浙江大学学报(工学版), 2024, 58(11): 2270-2279.
[11]	薛雅丽,周李尊,王林飞,欧阳权. 基于多特征重构的三维目标反演算法[J]. 浙江大学学报(工学版), 2024, 58(11): 2199-2207.
[12]	冯志成,杨杰,陈智超. 基于轻量级Transformer的城市路网提取方法[J]. 浙江大学学报(工学版), 2024, 58(1): 40-49.
[13]	郭浩然,郭继昌,汪昱东. 面向水下场景的轻量级图像语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(7): 1278-1286.
[14]	刘春娟,乔泽,闫浩文,吴小所,王嘉伟,辛钰强. 基于多尺度互注意力的遥感图像语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(7): 1335-1344.
[15]	张海波,蔡磊,任俊平,王汝言,刘富. 基于Transformer的高效自适应语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(6): 1205-1214.

Viewed

Full text

Abstract

Cited

Shared

Discussed