基于轴向注意力的多任务自动驾驶环境感知算法

doi:10.3785/j.issn.1008-973X.2025.04.012

浙江大学学报(工学版)

2025, Vol. 59

Issue (4): 769-777 DOI: 10.3785/j.issn.1008-973X.2025.04.012

计算机技术与控制工程

基于轴向注意力的多任务自动驾驶环境感知算法

李沈崇1(

),曾新华2,*(

),林传渠1

1. 湖州师范学院信息工程学院，浙江湖州 313000
2. 复旦大学工程与应用技术研究院，上海 200433

Multi-task environment perception algorithm for autonomous driving based on axial attention

Shenchong LI1(

),Xinhua ZENG2,*(

),Chuanqu LIN1

1. School of Information Engineering, Huzhou University, Huzhou 313000, China
2. Academy for Engineering and Technology, Fudan University, Shanghai 200433, China

全文: PDF(4344 KB) HTML

摘要：

为了满足自动驾驶要求并提升多模型间的协同效果，基于共享主干网络提出新的算法. 为了提升模型的位置表达能力，将轴向注意力机制加入主干网络，在保持轻量化特征提取的前提下建立全局关键点间的联系. 在多尺度信息提取阶段，引入自适应权重分配方法和三维注意力机制，降低不同尺度特征间的信息冲突. 根据难分样本区域优化损失函数，加强所提算法在难样本区域的细节识别能力. 在BDD100K数据集上的实验结果表明，相比YOLOP，所提算法在交通目标检测任务中的平均精度均值（在IoU=50%的情况下）提高了3.3个百分点，在道路可行驶区域分割任务中的mIoU提升了1.0个百分点，车道线检测准确率提升了6.7个百分点，推理速度为223.7帧/s. 所提算法在交通目标检测、可行驶区域分割和车道线检测任务上了均表现出良好的性能，能够较好平衡检测精度与推理速度.

关键词： 多任务学习; 目标检测; 语义分割; 自动驾驶; 特征融合; 轴向注意力

Abstract:

A new algorithm was proposed based on a shared backbone network to meet the autonomous driving requirements and to improve the synergy effect among multiple models. An axial attention mechanism was added to the backbone network, and connections between global key points were established while maintaining lightweight feature extraction to enhance the location representation of the model. The adaptive weight allocation method, along with the implementation of a three-dimensional attention mechanism, was devised to mitigate the information conflict that emerges from the diverse scale features present in the multi-scale information extraction phase. The loss function was optimized based on the challenging sample region, and the capacity of the proposed algorithm to capture intricate details in the difficult sample region was strengthened. Experimental results in the BDD100K dataset showed that compared with YOLOP, the proposed algorithm improved the mean average accuracy in the traffic target detection task (at IoU=50%) by 3.3 percentage points, the mIoU in road drivable area segmentation task by 1.0 percentage points, the accuracy of lane line detection by 6.7 percentage points, and the reasoning speed was 223.7 frames per second. The proposed algorithm demonstrates excellent performance in traffic target detection, drivable area segmentation, and lane line detection, and achieves a good balance between detection accuracy and reasoning speed at the same time.

Key words: multi-task learning object detection semantic segmentation automatic driving feature fusion axial attention

收稿日期: 2024-01-22 出版日期: 2025-04-25

CLC:

TP 391

基金资助: 国家自然科学基金资助项目（62373148）.

通讯作者: 曾新华 E-mail: 994930867@qq.com;zengxh@fudan.edu.cn

作者简介: 李沈崇（1997—），男，硕士生，从事智能信息处理研究. orcid.org/0009-0004-3041-2636. E-mail：994930867@qq.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	李沈崇
	曾新华
	林传渠

引用本文:

李沈崇,曾新华,林传渠. 基于轴向注意力的多任务自动驾驶环境感知算法[J]. 浙江大学学报(工学版), 2025, 59(4): 769-777.

Shenchong LI,Xinhua ZENG,Chuanqu LIN. Multi-task environment perception algorithm for autonomous driving based on axial attention. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 769-777.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.04.012 或 https://www.zjujournals.com/eng/CN/Y2025/V59/I4/769

图 1 基于轴向注意力的多任务自动驾驶环境感知算法的网络结构

图 2 Sea-Attention模块的网络结构

图 3 改进的跨阶段局部模块结构

图 4 自适应权重融合模块结构

图 5 SimAM三维注意力机制模块结构

图 6 解耦检测头结构

图 7 BDD100K数据集中不同场景及天气图像示例

表 1 不同算法在BDD100K数据集上的交通目标检测结果

表 2 不同算法在BDD100K数据集上的可行驶区域分割结果

表 3 不同算法在BDD100K数据集上的车道线检测结果

图 8 不同算法在白天道路场景中的交通环境感知对比

图 9 不同算法在夜晚道路场景中的交通环境感知对比

表 4 所提算法的模块消融实验结果

图 10 加入Sea-Attention模块前后的热力图

1	LIANG X, NIU M, HAN J, et al. Visual exemplar driven task-prompting for unified perception in autonomous driving [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver: IEEE, 2023: 9611–9621.
2	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector [C]// Computer Vision – ECCV 2016 . [S. l.]: Springer, 2016: 21–37.
3	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 779–788.
4	REDMON J, FARHADI A. YOLOv3: an incremental improvement [EB/OL]. (2018–04–08)[2024–03–08]. https://www.arxiv.org/pdf/1804.02767.
5	BOCHKOVSKIY A, WANG C Y, LIAO H M, et al. YOLOv4: optimal speed and accuracy of object detection [EB/OL]. (2020–4–23)[2024–01–22]. https://arxiv.org/pdf/2004.10934.
6	蒋超, 张豪, 章恩泽, 等基于改进YOLOv5s的行人车辆目标检测算法[J]. 扬州大学学报: 自然科学版, 2022, 25 (6): 45- 49 JIANG Chao, ZHANG Hao, ZHANG Enze, et al Pedestrian and vehicle target detection algorithm based on the improved YOLOv5s[J]. Journal of Yangzhou University: Natural Science Edition, 2022, 25 (6): 45- 49
7	韩俊, 袁小平, 王准, 等基于YOLOv5s的无人机密集小目标检测算法[J]. 浙江大学学报: 工学版, 2023, 57 (6): 1224- 1233 HAN Jun, YUAN Xiaoping, WANG Zhun, et al UAV dense small target detection algorithm based on YOLOv5s[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (6): 1224- 1233
8	GIRSHICK R. Fast R-CNN [C]// Proceedings of the IEEE International Conference on Computer Vision . Santiago: IEEE, 2015: 1440–1448.
9	REN S, HE K, GIRSHICK R, et al Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149 doi: 10.1109/TPAMI.2016.2577031
10	CHEN L C, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 834- 848
11	LIN G, MILAN A, SHEN C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 5168–5177.
12	ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu. IEEE, 2017: 6230–6239.
13	YU C, WANG J, PENG C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation [C]// Computer Vision – ECCV 2018 . [S.l.]: Springe, 2018: 334–349.
14	FAN M, LAI S, HUANG J, et al. Rethinking BiSeNet for real-time semantic segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 9711–9720.
15	WANG Z, REN W, QIU Q. LaneNet: real-time lane detection networks for autonomous driving [EB/OL]. (2018–07–04)[2024–03–08]. https://arxiv.org/pdf/1807.01726.
16	TEICHMANN M, WEBER M, ZOELLNER M, et al. MultiNet: real-time joint semantic reasoning for autonomous driving [EB/OL]. (2016–12–22)[2024–03–08]. https://arxiv.org/pdf/1612.07695.
17	QIAN Y, DOLAN J M, YANG M DLT-Net: joint detection of drivable areas, lane lines, and traffic objects[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21 (11): 4670- 4679
18	WU D, LIAO M W, ZHANG W T, et al YOLOP: you only look once for panoptic driving perception[J]. Machine Intelligence Research, 2022, 19 (6): 550- 562
19	VU D, NGO B, PHAN H. HybridNets: end-to-end perception network [EB/OL]. (2022–03–17)[2024–03–08]. https://arxiv.org/pdf/2203.09035.
20	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [EB/OL]. (2021–06–03)[2024–03–08]. https://arxiv.org/pdf/2010.11929.
21	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers [C]// Computer Vision – ECCV 2020 . [S.l.]: Springer, 2020: 213–229.
22	LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 9992–10002.
23	WAN Q, HUANG Z, LU J, et al. SeaFormer++: squeeze-enhanced axial transformer for mobile semantic segmentation [EB/OL]. (2023–06–09)[2024–03–08]. https://arxiv.org/pdf/2301.13156.
24	LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 8759–8768.
25	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Computer Vision – ECCV 2018 . [S.l.]: Springer, 2018: 3–19.
26	YANG L, ZHANG R Y, LI L, et al. SimAM: a simple, parameter-free attention module for convolutional neural networks [C]// Proceeding of the 38th International Conference on Machine Learning . Vienna: ACM, 2021: 11863-11874.
27	DUAN K, BAI S, XIE L, et al. CenterNet: keypoint triplets for object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 6569–6578.
28	ZHANG Y, ZHANG W, YU J, et al Complete and accurate holly fruits counting using YOLOX object detection[J]. Computers and Electronics in Agriculture, 2022, 198: 107062
29	LI X, WANG W, WU L, et al Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002- 21012

[1]	陈文强,王东丹,朱文英,汪勇杰,王涛. 基于时空图注意力网络的车辆多模态轨迹预测模型[J]. 浙江大学学报(工学版), 2025, 59(3): 443-450.
[2]	刘登峰,郭文静,陈世海. 基于内容引导注意力的车道线检测网络[J]. 浙江大学学报(工学版), 2025, 59(3): 451-459.
[3]	王浚银,文斌,沈艳军,张俊,王子豪. 基于改进YOLOv7-tiny的铝型材表面缺陷检测方法[J]. 浙江大学学报(工学版), 2025, 59(3): 523-534.
[4]	何永福,谢世维,于佳禄,陈思宇. 考虑跨层特征融合的抛洒风险车辆检测方法[J]. 浙江大学学报(工学版), 2025, 59(2): 300-309.
[5]	董红召,林少轩,佘翊妮. 交通目标YOLO检测技术的研究进展[J]. 浙江大学学报(工学版), 2025, 59(2): 249-260.
[6]	于家艺,吴秦. 基于上下文信息增强和深度引导的单目3D目标检测[J]. 浙江大学学报(工学版), 2025, 59(1): 89-99.
[7]	刘欢,李云红,张蕾涛,郭越,苏雪平,朱耀麟,侯乐乐. 基于MA-ConvNext网络和分步关系知识蒸馏的苹果叶片病害识别[J]. 浙江大学学报(工学版), 2024, 58(9): 1757-1767.
[8]	李凡,杨杰,冯志成,陈智超,付云骁. 基于图像识别的弓网接触点检测方法[J]. 浙江大学学报(工学版), 2024, 58(9): 1801-1810.
[9]	黎文皓,季彦婕,吴浩,贾叶雯,张水潮. 基于多智体的自动驾驶汽车停车空载收费策略[J]. 浙江大学学报(工学版), 2024, 58(8): 1659-1670.
[10]	杨军,张琛. 基于边界点估计与稀疏卷积神经网络的三维点云语义分割[J]. 浙江大学学报(工学版), 2024, 58(6): 1121-1132.
[11]	宋娟,贺龙喜,龙会平. 基于深度学习的隧道衬砌多病害检测算法[J]. 浙江大学学报(工学版), 2024, 58(6): 1161-1173.
[12]	刘毅,陈一丹,高琳,洪姣. 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报(工学版), 2024, 58(5): 951-959.
[13]	范康,钟铭恩,谭佳威,詹泽辉,冯妍. 联合语义分割和深度估计的交通场景感知算法[J]. 浙江大学学报(工学版), 2024, 58(4): 684-695.
[14]	曹寅,秦俊平,高彤,马千里,任家琪. 基于生成对抗网络的文本两阶段生成高质量图像方法[J]. 浙江大学学报(工学版), 2024, 58(4): 674-683.
[15]	钱庆杰,余军合,战洪飞,王瑞,胡健. 基于DL-BiGRU多特征融合的注塑件尺寸预测方法[J]. 浙江大学学报(工学版), 2024, 58(3): 646-654.

Viewed

Full text

Abstract

Cited

Shared

Discussed