Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (1): 89-99    DOI: 10.3785/j.issn.1008-973X.2025.01.009
计算机与控制工程     
基于上下文信息增强和深度引导的单目3D目标检测
于家艺1(),吴秦1,2,*()
1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
2. 江南大学 江苏省模式识别与计算智能工程实验室,江苏 无锡 214122
Monocular 3D object detection based on context information enhancement and depth guidance
Jiayi YU1(),Qin WU1,2,*()
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computing Intelligence, Jiangnan University, Wuxi 214122, China
 全文: PDF(2813 KB)   HTML
摘要:

为了充分利用单目图像提供的特征信息,提出上下文信息增强和深度引导的单目3D目标检测方法. 设计高效的上下文信息增强模块,使用多个大核卷积自适应地增强多尺度目标的上下文信息,利用深度可分离卷积和条形卷积操作有效减少大核卷积的参数量和计算复杂度. 统计分析3D目标框各个属性的预测误差,发现3D目标框的长度和深度属性预测不准确是导致预测框偏差大的主要原因. 设计深度误差加权损失函数,在训练过程中进行目标的长度和深度预测监督,提高长度和深度属性的预测精度,进而提升3D预测框的准确性. 在KITTI数据集上开展实验,结果表明,所提方法在数据集的多个级别上的平均准确度高于现有的单目3D目标检测方法.

关键词: 单目3D目标检测大核卷积深度可分离卷积条形卷积多尺度目标    
Abstract:

A method based on context information enhancement and depth guidance was proposed to fully utilize the feature information provided by a monocular image. An efficient context information enhancement module was proposed to adaptively enhance the context information for multi-scale objects by using multiple large kernel convolutions, and the depth-wise separable convolution and strip convolution were adopted to effectively reduce the parameter count and computational complexity associated with large kernel convolutions. The prediction errors of each attribute in the 3D object bounding box were analyzed, and the primary cause of the large deviation in the prediction bounding box is the inaccurate prediction of the length and depth of the 3D object. A depth error weighted loss function was proposed to provide supervision for the predictions of length and depth for the 3D object during the training process. By using the proposed loss function, the prediction accuracy of the length and depth attributes was improved, and the accuracy of the 3D prediction bounding box was enhanced. Experiments were conducted on the KITTI dataset, and the results showed that the proposed method achieved higher accuracy than existing monocular 3D object detection methods at multiple levels of the dataset.

Key words: monocular 3D object detection    large kernel convolution    depth-wise separable convolution    strip convolution    multi-scale object
收稿日期: 2023-11-29 出版日期: 2025-01-18
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(61972180).
通讯作者: 吴秦     E-mail: 3076710949@qq.com;qinwu@jiangnan.edu.cn
作者简介: 于家艺(1999—),男,硕士生,从事目标检测研究. orcid.org/0009-0001-2432-9244. E-mail:3076710949@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
于家艺
吴秦

引用本文:

于家艺,吴秦. 基于上下文信息增强和深度引导的单目3D目标检测[J]. 浙江大学学报(工学版), 2025, 59(1): 89-99.

Jiayi YU,Qin WU. Monocular 3D object detection based on context information enhancement and depth guidance. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 89-99.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.01.009        https://www.zjujournals.com/eng/CN/Y2025/V59/I1/89

图 1  上下文信息增强和深度引导模型的架构
图 2  基线模型在KITTI验证集上不同属性的预测误差直方图
模型发表位置$ {{\mathrm{AP}}_{{\mathrm{3D}}}}|{R_{40}} $$ {{\mathrm{AP}}_{{\mathrm{BEV}}}}|{R_{40}} $
简单中等困难简单中等困难
CaDDN[21]CVPR2119.1713.4111.4627.9418.9117.19
Monodle[22]CVPR2117.2312.2610.2924.7918.8916.00
GrooMeD-NMS[23]CVPR2118.1012.329.6526.1918.2714.05
MonoEF[24]CVPR2121.2913.8711.7129.0319.7017.26
MonoFlex[25]CVPR2119.9413.8912.0728.2319.7516.89
AutoShape[7]ICCV2122.4714.1711.3630.6620.0815.95
GUPNet[10]ICCV2122.2615.0213.1230.2921.1918.20
PCT[26]NeurIPS2121.0013.3711.3129.6519.0315.92
MonoGround[27]CVPR2221.3714.3612.6230.0720.4717.74
HomoLoss[28]CVPR2221.7514.9413.0729.6020.6817.81
MonoDTR[14]CVPR2221.9915.3912.7328.5920.3817.14
MonoJSG[29]CVPR2224.6916.1413.6432.5921.2618.18
DCD[9]ECCV2223.8115.9013.2132.5521.5018.25
DEVIANT[30]ECCV2221.8814.4611.8929.6520.4417.43
DID-M3D[17]ECCV2224.4016.2913.7532.9522.7619.83
SGM3D[31]RAL2222.4614.6512.9731.4921.3718.43
MonoCon[32]AAAI2222.5016.4613.9531.1222.1019.00
MonoRCNN++[33]WACV2320.0813.7211.34
MonoEdge[34]WACV2321.0814.4712.7328.8020.3517.57
本研究26.7416.6714.3334.7322.8419.52
表 1  不同目标检测模型在KITTI测试集上的单目3D目标检测精度对比
实验ECIE$ {L}_{r}^{d} $$ {L}_{r}^{l} $${{\mathrm{AP}}_{3{\mathrm{D}}}}|{R_{40}} $${{\mathrm{AP}}_{{\mathrm{BEV}}}}|{R_{40}} $
简单中等困难简单中等困难
125.4217.0914.0833.9023.3019.51
226.7318.2515.1934.2023.7220.90
326.0317.5814.5634.6124.5921.05
426.0617.8414.7233.0224.1720.63
527.2818.1114.9335.3524.5620.97
627.1118.2315.0435.4324.6921.01
727.0318.2515.0035.1323.9620.97
827.5618.3215.1335.8524.8221.19
表 2  不同方法组合下的单目3D目标检测精度
71121${{\mathrm{AP}}_{3{\mathrm{D}}}}|{R_{40}} $${{\mathrm{AP}}_{{\mathrm{BEV}}}}|{R_{40}} $
简单中等困难简单中等困难
24.5417.1214.1432.6522.9520.04
26.0217.3514.3832.6322.6919.99
25.1817.3614.3533.2923.2020.39
25.8817.6414.5233.5823.2620.42
26.6017.7314.6832.6722.8019.37
26.6217.8814.7234.3823.5820.67
26.7318.2515.1934.2023.7220.90
表 3  高效的上下文信息增强模块中不同卷积核大小的单目3D目标检测精度
图 3  深度误差加权损失中目标检测精度随超参数的变化
卷积操作参数量/106复杂度/109
普通卷积2.60980.153
深度可分离卷积0.0451.376
深度可分离卷积+条形卷积0.0110.328
表 4  高效的上下文信息增强模块中不同卷积操作的参数量和复杂度对比
$ {L}_{r} $AP3D(IoU=0.7,IoU=0.5)
整体d$ \in $(0,30] md$ \in $(30,50] md>50 m
1.95,9.275.63,19.560.91,6.720.15,1.70
2.3311.477.2323.520.73,7.100.242.53
表 5  Waymo数据集上深度误差加权损失的单目3D目标检测精度
图 4  不同目标检测模型在KITTI验证集上的目标检测结果可视化
1 LIU Y X, YUAN Y X, LIU M Ground-aware monocular 3D object detection for autonomous driving[J]. IEEE Robotics and Automation Letters, 2021, 6 (2): 919- 926
doi: 10.1109/LRA.2021.3052442
2 SIMONELLI A, BULÒ S R, PORZI L, et al. Disentangling monocular 3D object detection [C]// IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 1991–1999.
3 WANG Y, CHAO W L, GARG D, et al. Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 8445–8453.
4 MA X Z, LIU S N, XIA Z Y, et al. Rethinking pseudo-LiDAR representation [C]// European Conference on Computer Vision . Glasgow: Springer, 2020: 311–327.
5 PENG L, LIU F, YU Z X, et al. LiDAR point cloud guided monocular 3D object detection [C]// European Conference on Computer Vision . Tel Aviv: Springer, 2022: 123–139.
6 HONG Y, DAI H, DING Y. Cross-modality knowledge distillation network for monocular 3D object detection [C]// European Conference on Computer Vision . Tel Aviv: Springer, 2022: 87–104.
7 LIU Z D, ZHOU D F, LU F X, et al. AutoShape: real-time shape-aware monocular 3D object detection [C]/ / IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 15641–15650.
8 张峻宁, 苏群星, 刘鹏远, 等 基于空间约束的自适应单目3D物体检测算法[J]. 浙江大学学报: 工学版, 2020, 54 (6): 1138- 1146
ZHANG Junning, SU Qunxing, LIU Pengyuan, et al Adaptive monocular 3D object detection algorithm based on spatial constraint[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (6): 1138- 1146
9 LI Y Y, CHEN Y T, HE J W, et al. Densely constrained depth estimator for monocular 3D object detection [C]// European Conference on Computer Vision . Tel Aviv: Springer, 2022: 718–734.
10 LU Y, MA X Z, YANG L, et al. Geometry uncertainty projection network for monocular 3D object detection [C]// IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 3111–3121.
11 LIU Z C, WU Z Z, TÓTH R. SMOKE: single-stage monocular 3D object detection via keypoint estimation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Seattle: IEEE, 2020: 996–997.
12 BRAZIL G, LIU X M. M3D-RPN: monocular 3D region proposal network for object detection [C]// IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 9287–9296.
13 ZHANG R R, QIU H, WANG T, et al. MonoDETR: depth-guided transformer for monocular 3D object detection [C]// IEEE/CVF International Conference on Computer Vision . Paris: IEEE, 2023: 9155–9166.
14 HUANG K C, WU T H, SU H T, et al. MonoDTR: monocular 3D object detection with depth-aware transformer [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 4012–4021.
15 YU F, WANG D Q, SHELHAMER E, et al. Deep layer aggregation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 2403–2412.
16 ZHOU X Y, WANG D Q, KRÄHENBÜHL P. Objects as points [EB/OL]. (2019–04–25)[2023–11–29]. https://arxiv.org/pdf/1904.07850.
17 PENG L, WU X P, YANG Z, et al. DID-M3D: decoupling instance depth for monocular 3D object detection [C]// European Conference on Computer Vision . Tel Aviv: Springer, 2022: 71–88.
18 GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite [C]// IEEE Conference on Computer Vision and Pattern Recognition . Providence: IEEE, 2012: 3354–3361.
19 MOUSAVIAN A, ANGUELOV D, FLYNN J, et al. 3D bounding box estimation using deep learning and geometry [C]// IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 7074–7082.
20 SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 2446–2454.
21 READING C, HARAKEH A, CHAE J, et al. Categorical depth distribution network for monocular 3D object detection [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 8555–8564.
22 MA X Z, ZHANG Y M, XU D, et al. Delving into localization errors for monocular 3D object detection [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 4721–4730.
23 KUMAR A, BRAZIL G, LIU X M. GrooMeD-NMS: grouped mathematically differentiable NMS for monocular 3D object detection [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 8973–8983.
24 ZHOU Y S, HE Y, ZHU H Z, et al. Monocular 3D object detection: an extrinsic parameter free approach [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 7556–7566.
25 ZHANG Y P, LU J W, ZHOU J. Objects are different: flexible monocular 3D object detection [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 3289–3298.
26 WANG L, ZHANG L, ZHU Y, et al. Progressive coordinate transforms for monocular 3D object detection [C]// The 35th International Conference on Neural Information Processing Systems . [S. l.]: Curran Associates, 2021: 13364–13377.
27 QIN Z Q, LI X. MonoGround: detecting monocular 3D objects from the ground [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 3793–3802.
28 GU J Q, WU B J, FAN L B, et al. Homography loss for monocular 3D object detection [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 1080–1089.
29 LIAN Q, LI P L, CHEN X Z. MonoJSG: joint semantic and geometric cost volume for monocular 3D object detection [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 1070–1079.
30 KUMAR A, BRAZIL G, CORONA E, et al. DEVIANT: depth equivariant network for monocular 3D object detection [C]// European Conference on Computer Vision . Tel Aviv: Springer, 2022: 664–683.
31 ZHOU Z Y, DU L, YE X Q, et al SGM3D: stereo guided monocular 3D object detection[J]. IEEE Robotics and Automation Letters, 2022, 7 (4): 10478- 10485
doi: 10.1109/LRA.2022.3191849
32 LIU X P, XUE N, WU T F. Learning auxiliary monocular contexts helps monocular 3D object detection [C]// AAAI Conference on Artificial Intelligence . Vancouver: AAAI, 2022: 1810–1818.
33 SHI X P, CHEN Z X, KIM T K. Multivariate probabilistic monocular 3D object detection [C]// IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa: IEEE, 2023: 4281–4290.
[1] 陈健松,蔡艺军. 面向垃圾分类场景的轻量化目标检测方案[J]. 浙江大学学报(工学版), 2024, 58(1): 71-77.
[2] 孙炜,刘恒,陶建峰,孙浩,刘成良. 基于IndRNN-1DLCNN的负载口独立控制阀控缸系统故障诊断[J]. 浙江大学学报(工学版), 2023, 57(10): 2028-2041.
[3] 柳长源,何先平,毕晓君. 融合注意力机制的高效率网络车型识别[J]. 浙江大学学报(工学版), 2022, 56(4): 775-782.