Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (11): 2219-2229    DOI: 10.3785/j.issn.1008-973X.2024.11.003
计算机技术、控制工程     
基于局部信息融合的点云3D目标检测算法
张林杰1(),柴志雷1,2,*(),王宁1
1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
2. 江苏省模式识别与计算智能工程实验室,江苏 无锡 214122
Point cloud 3D object detection algorithm based on local information fusion
Linjie ZHANG1(),Zhilei CHAI1,2,*(),Ning WANG1
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Wuxi 214122, China
 全文: PDF(2318 KB)   HTML
摘要:

针对当前基于点云的三维目标检测算法缺乏目标准确的空间位置信息,提出局部信息编码模块和后期交叉融合模块的三维目标检测算法. 在特征提取阶段,模型通过三维稀疏卷积高效地编码全局特征. 局部信息编码模块利用目标内部的原始点云信息,构建目标的细粒度语义信息,通过自注意力机制对这些信息进行重新加权,增强局部特征的表达能力. 提出交叉融合模块,用于局部特征与全局特征的信息交互,产生表达能力更强的目标检测特征. 使用KITTI和Waymo公开数据集,验证所提出的方法. 在KITTI数据集的简单、中等和困难任务上,本文方法的平均准确率AP0.7分别达到了91.60%、82.53%和77.83%,在Waymo数据集上的平均准确率AP0.7达到74.92%.

关键词: 点云稀疏卷积局部信息注意力机制交叉融合    
Abstract:

A three-dimensional object detection algorithm with a local information encoding module and a subsequent cross-fusion module was proposed aiming at the current lack of accurate spatial position information for three-dimensional object detection algorithms based on point clouds. Global features were efficiently encoded using 3D sparse convolution during the feature extraction phase. The local information encoding module leveraged the intrinsic information within the object’s point cloud, constructing fine-grained semantic details. The information was reweighted to enhance the representation of local features through a self-attention mechanism. A cross-fusion module was introduced to facilitate interaction between local and global features, resulting in enhanced object detection features. The proposed method was validated using the KITTI and Waymo datasets. The average precision at IoU 0.7 for easy, moderate and hard tasks achieved 91.60%, 82.53%, and 77.83%, respectively on the KITTI dataset. The average precision at IoU 0.7 reached 74.92% on the Waymo dataset.

Key words: point cloud    sparse convolution    local information    attention mechanism    cross fusion
收稿日期: 2023-07-03 出版日期: 2024-10-23
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(61972180);江苏省模式识别与计算智能工程实验室资助项目.
通讯作者: 柴志雷     E-mail: sanmu_mu@163.com;zlchai@jiangnan.edu.cn
作者简介: 张林杰(1998—),男,硕士生,从事自动驾驶感知系统的研究. orcid.org/0009-0003-3179-4165. E-mail:sanmu_mu@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
张林杰
柴志雷
王宁

引用本文:

张林杰,柴志雷,王宁. 基于局部信息融合的点云3D目标检测算法[J]. 浙江大学学报(工学版), 2024, 58(11): 2219-2229.

Linjie ZHANG,Zhilei CHAI,Ning WANG. Point cloud 3D object detection algorithm based on local information fusion. Journal of ZheJiang University (Engineering Science), 2024, 58(11): 2219-2229.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.11.003        https://www.zjujournals.com/eng/CN/Y2024/V58/I11/2219

图 1  所提模型的整体架构
图 2  局部信息编码模块的内部结构
图 3  交叉注意力融合模块的内部结构
方法模态AP3D/%APBEV/%
简单中等困难mAP简单中等困难mAP
Point-GNN[16]L88.3379.4772.2980.0393.1189.1783.9088.73
3DSSD[18]L88.3679.5774.5580.8392.6689.0285.8689.18
PV-RCNN[30]L90.2581.4376.8282.8394.9890.6586.1490.60
Voxel-RCNN[9]L90.9081.6277.0683.1994.8588.8386.1389.94
CT3D[8]L87.8381.7777.1682.2592.3688.8384.0788.42
Pyramid-PV[28]L88.3982.0877.4982.6592.1988.8486.2189.08
VoTr[21]L89.9082.0979.1483.7194.0390.3486.1490.17
SPG[22]L90.5082.1378.9083.8494.3388.7085.9889.67
VoxSet[27]L88.5382.0677.4682.68
PDV[31]L90.4381.8677.3683.2294.5690.4886.2390.42
VFF[32]L+I89.5082.0979.2983.62
PG-RCNN[23]I89.3882.1377.3382.8893.3989.4686.5489.80
PVT-SSD[24]I90.6582.2976.8583.2695.2391.6386.4391.10
DVF-PV[25]L+I90.9982.4077.3783.58
本文方法L91.6082.5377.8383.9995.5991.3786.7291.23
表 1  KITTI测试数据集上不同算法的检测结果对比
方法AP3D/%APBEV/%
简单中等困难简单中等困难
PV-RCNN[30]92.5784.4382.6995.7691.1188.93
Voxel-RCNN[9]92.3885.2982.8695.5291.2588.99
PDV[31]92.5685.2983.05
VFF[32]92.4785.6583.3895.6291.7591.39
CT3D[8]92.8585.8283.4696.1491.8889.63
本文方法93.2786.0083.5796.6692.1189.75
表 2  KITTI验证数据集上不同算法的检测结果对比
方法AP/APH (LEVEL_1)AP/APH (LEVEL_2)
d = 0~30 md = 30~50 md > 50 m均值d = 0~30 md = 30~50 md > 50 m均值
SECOND[20]88.66/88.1867.35/66.7042.89/42.0970.07/69.5287.33/86.8660.92/60.2332.39/31.7761.63/61.14
PV-RCNN[30]91.30/90.5673.00/72.3151.35/50.3474.70/74.0989.75/89.2966.32/65.6839.27/38.4666.05/65.50
Voxel-RCNN[9]90.81/90.3672.43/71.7850.37/49.4773.90/73.3289.50/89.0565.68/65.0838.32/37.6165.10/64.58
本文方法91.20/90.7773.28/72.6852.40/51.4574.92/74.3889.91/89.4866.58/66.0240.04/39.2966.19/65.69
表 3  Waymo验证数据集不同算法的检测结果对比
实验LPESICCAFAP3D/%
简单中等困难
92.3885.2982.86
实验(a)92.8085.4883.21
实验(b)93.0485.8383.50
实验(c)93.2786.0083.57
表 4  所提出模型的消融实验结果(KITTI)
图 4  KITTI和Waymo数据集的可视化结果对比展示
方法CONGRUSEACAFAP3D/%
84.52
方法184.92
方法285.33
方法385.50
方法485.77
表 5  不同融合方法对局部信息融合的性能对比(KITTI)
模型v/(帧·s?1)C/GBAP3D/%
Voxel-RCNN[9]0.04122.7885.29
PV-RCNN[30]0.12889.2784.43
Point-RCNN[14]0.15327.7178.63
本文方法0.06726.6486.00
表 6  提出方法与其他模型的性能和效率对比
1 MAO J, SHI S, WANG X, et al 3D object detection for autonomous driving: a comprehensive survey[J]. International Journal of Computer Vision, 2023, 131: 1909- 1963
doi: 10.1007/s11263-023-01790-1
2 MUHAMMAD K, HUSSAIN T, ULLAH H, et al Vision-based semantic segmentation in scene understanding for autonomous driving: recent achievements, challenges, and outlooks[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (12): 22694- 22715
doi: 10.1109/TITS.2022.3207665
3 BEHLEY J, GARBADE M, MILIOTO A, et al. Semantickitti: a dataset for semantic scene understanding of lidar sequences [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 9297-9307.
4 LIU Z, WU S, JIN S, et al Investigating pose representations and motion contexts modeling for 3D motion prediction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45 (1): 681- 697
5 AKSAN E, KAUFMANN M, CAO P, et al. A spatio-temporal transformer for 3d human motion prediction [C]// International Conference on 3D Vision . [S. l. ]: IEEE, 2021: 567-574.
6 CUI A, CASAS S, SADAT A, et al. Lookout: diverse multi-future prediction and planning for self-driving [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 16107-16116.
7 DERUYTTERE T, VANDENHENDE S, GRUJICIC D, et al. Talk2car: taking control of your self-driving car [C]//. Processing and the 9th International Joint Conference on Natural Language Processing , Hong Kong: ACL, 2019: 2088-2098.
8 SHENG H, CAI S, LIU Y, et al. Improving 3d object detection with channel-wise transformer [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 2743-2752.
9 DENG J, SHI S, LI P, et al. Voxel R-CNN: towards high performance voxel-based 3d object detection [C]// Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver: AAAI, 2021: 1201-1209.
10 GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the kitti vision benchmark suite [C]// IEEE Conference on Computer Vision and Pattern Recognition . Providence: IEEE, 2012: 3354-3361.
11 SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 2446-2454.
12 HUO Weile, JING Tao, REN Shuang Review of 3D object detection for autonomous driving[J]. Computer Science, 2023, 50 (7): 107- 118
13 QI C R, SU H, MO K, et al. Pointnet: deep learning on point sets for 3D classification and segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 652-660.
14 SHI S, WANG X, LI H. Pointrcnn: 3d object proposal generation and detection from point cloud [C]// Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 770-779.
15 QI C R, LITANY O, HE K, et al. Deep hough voting for 3d object detection in point clouds [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 9277-9286.
16 SHI W, RAJKUMAR R. Point-gnn: graph neural network for 3d object detection in a point cloud [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 1711-1719.
17 YANG Z, SUN Y, LIU S, et al. Std: sparse-to-dense 3d object detector for point cloud [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 1951-1960.
18 YANG Z, SUN Y, LIU S, et al. 3dssd: point-based 3d single stage object detector [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 11040-11048.
19 ZHOU Y, TUZEL O. Voxelnet: end-to-end learning for point cloud based 3d object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 4490-4499.
20 YAN Y, MAO Y, LI B Second: sparsely embedded convolutional detection[J]. Sensors, 2018, 18 (10): 3337- 3353
doi: 10.3390/s18103337
21 MAO J, XUE Y, NIU M, et al. Voxel transformer for 3d object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 3164-3173.
22 XU Q, ZHOU Y, WANG W, et al. Spg: unsupervised domain adaptation for 3d object detection via semantic point generation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 15446-15456.
23 KOO I, LEE I, KIM S H, et al. PG-RCNN: semantic surface point generation for 3D object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Vancouver: IEEE, 2023: 18142-18151.
24 YANG H, WANG W, CHEN M, et al. PVT-SSD: single-stage 3D object detector with point-voxel Transformer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 13476-13487.
25 MAHMOUD A, HU J S, WASLANDER S L. Dense voxel fusion for 3D object detection [C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa: IEEE, 2023: 663-672.
26 VASWANI A, SHAZEER N, PARMAR N, et al Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30 (2): 6000- 6010
27 HE C, LI R, LI S, et al. Voxel set transformer: a set-to-set approach to 3d object detection from point clouds [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 8417-8427.
28 MAO J, NIU M, BAI H, et al. Pyramid R-CNN: towards better performance and adaptability for 3d object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . [S. l. ]: IEEE, 2021: 2723-2732.
29 PAN X, XIA Z, SONG S, et al. 3d object detection with pointformer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . [S. l. ]: IEEE, 2021: 7463-7472.
30 SHI S, GUO C, JIANG L, et al. Pv-rcnn: point-voxel feature set abstraction for 3d object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . [S. l. ]: IEEE, 2020: 10529-10538.
31 HU J S, KUAI T, WASLANDER S L. Point density-aware voxels for lidar 3d object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 8469-8478.
[1] 李灿林,王新玥,马利庄,邵志文,张文娇. 融合注意力机制和结构线提取的图像卡通化[J]. 浙江大学学报(工学版), 2024, 58(8): 1728-1737.
[2] 李忠良,陈麒,石琳,杨朝,邹先明. 时间感知组合的动态知识图谱补全[J]. 浙江大学学报(工学版), 2024, 58(8): 1738-1747.
[3] 吴书晗,王丹,陈远方,贾子钰,张越棋,许萌. 融合注意力的滤波器组双视图图卷积运动想象脑电分类[J]. 浙江大学学报(工学版), 2024, 58(7): 1326-1335.
[4] 马现伟,范朝辉,聂为之,李东,朱逸群. 对失效传感器具备鲁棒性的故障诊断方法[J]. 浙江大学学报(工学版), 2024, 58(7): 1488-1497.
[5] 杨军,张琛. 基于边界点估计与稀疏卷积神经网络的三维点云语义分割[J]. 浙江大学学报(工学版), 2024, 58(6): 1121-1132.
[6] 李运堂,李恒杰,张坤,王斌锐,关山越,陈源. 基于新型编码解码网络的复杂输电线识别[J]. 浙江大学学报(工学版), 2024, 58(6): 1133-1141.
[7] 邢志伟,朱书杰,李彪. 基于改进图卷积神经网络的航空行李特征感知[J]. 浙江大学学报(工学版), 2024, 58(5): 941-950.
[8] 刘毅,陈一丹,高琳,洪姣. 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报(工学版), 2024, 58(5): 951-959.
[9] 魏翠婷,赵唯坚,孙博超,刘芸怡. 基于改进Mask R-CNN与双目视觉的智能配筋检测[J]. 浙江大学学报(工学版), 2024, 58(5): 1009-1019.
[10] 宦海,盛宇,顾晨曦. 基于遥感图像道路提取的全局指导多特征融合网络[J]. 浙江大学学报(工学版), 2024, 58(4): 696-707.
[11] 宋明俊,严文,邓益昭,张俊然,涂海燕. 轻量化机器人抓取位姿实时检测算法[J]. 浙江大学学报(工学版), 2024, 58(3): 599-610.
[12] 江照意,邹文钦,郑晟豪,宋超,杨柏林. 基于场景流的可变速率动态点云压缩[J]. 浙江大学学报(工学版), 2024, 58(2): 279-287.
[13] 姚鑫骅,于涛,封森文,马梓健,栾丛丛,沈洪垚. 基于图神经网络的零件机加工特征识别方法[J]. 浙江大学学报(工学版), 2024, 58(2): 349-359.
[14] 薛雅丽,周李尊,王林飞,欧阳权. 基于多特征重构的三维目标反演算法[J]. 浙江大学学报(工学版), 2024, 58(11): 2199-2207.
[15] 周雕,熊馨,周建华,宗静,张琪. 卷积神经网络结合子域适应的低采样率肌电手势识别[J]. 浙江大学学报(工学版), 2024, 58(10): 2011-2019.