Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (11): 2219-2229    DOI: 10.3785/j.issn.1008-973X.2024.11.003
    
Point cloud 3D object detection algorithm based on local information fusion
Linjie ZHANG1(),Zhilei CHAI1,2,*(),Ning WANG1
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Wuxi 214122, China
Download: HTML     PDF(2318KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A three-dimensional object detection algorithm with a local information encoding module and a subsequent cross-fusion module was proposed aiming at the current lack of accurate spatial position information for three-dimensional object detection algorithms based on point clouds. Global features were efficiently encoded using 3D sparse convolution during the feature extraction phase. The local information encoding module leveraged the intrinsic information within the object’s point cloud, constructing fine-grained semantic details. The information was reweighted to enhance the representation of local features through a self-attention mechanism. A cross-fusion module was introduced to facilitate interaction between local and global features, resulting in enhanced object detection features. The proposed method was validated using the KITTI and Waymo datasets. The average precision at IoU 0.7 for easy, moderate and hard tasks achieved 91.60%, 82.53%, and 77.83%, respectively on the KITTI dataset. The average precision at IoU 0.7 reached 74.92% on the Waymo dataset.



Key wordspoint cloud      sparse convolution      local information      attention mechanism      cross fusion     
Received: 03 July 2023      Published: 23 October 2024
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(61972180);江苏省模式识别与计算智能工程实验室资助项目.
Corresponding Authors: Zhilei CHAI     E-mail: sanmu_mu@163.com;zlchai@jiangnan.edu.cn
Cite this article:

Linjie ZHANG,Zhilei CHAI,Ning WANG. Point cloud 3D object detection algorithm based on local information fusion. Journal of ZheJiang University (Engineering Science), 2024, 58(11): 2219-2229.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.11.003     OR     https://www.zjujournals.com/eng/Y2024/V58/I11/2219


基于局部信息融合的点云3D目标检测算法

针对当前基于点云的三维目标检测算法缺乏目标准确的空间位置信息,提出局部信息编码模块和后期交叉融合模块的三维目标检测算法. 在特征提取阶段,模型通过三维稀疏卷积高效地编码全局特征. 局部信息编码模块利用目标内部的原始点云信息,构建目标的细粒度语义信息,通过自注意力机制对这些信息进行重新加权,增强局部特征的表达能力. 提出交叉融合模块,用于局部特征与全局特征的信息交互,产生表达能力更强的目标检测特征. 使用KITTI和Waymo公开数据集,验证所提出的方法. 在KITTI数据集的简单、中等和困难任务上,本文方法的平均准确率AP0.7分别达到了91.60%、82.53%和77.83%,在Waymo数据集上的平均准确率AP0.7达到74.92%.


关键词: 点云,  稀疏卷积,  局部信息,  注意力机制,  交叉融合 
Fig.1 Overall architecture of proposed network
Fig.2 Internal structure of local information encoding module
Fig.3 Internal structure of cross-fusion module
方法模态AP3D/%APBEV/%
简单中等困难mAP简单中等困难mAP
Point-GNN[16]L88.3379.4772.2980.0393.1189.1783.9088.73
3DSSD[18]L88.3679.5774.5580.8392.6689.0285.8689.18
PV-RCNN[30]L90.2581.4376.8282.8394.9890.6586.1490.60
Voxel-RCNN[9]L90.9081.6277.0683.1994.8588.8386.1389.94
CT3D[8]L87.8381.7777.1682.2592.3688.8384.0788.42
Pyramid-PV[28]L88.3982.0877.4982.6592.1988.8486.2189.08
VoTr[21]L89.9082.0979.1483.7194.0390.3486.1490.17
SPG[22]L90.5082.1378.9083.8494.3388.7085.9889.67
VoxSet[27]L88.5382.0677.4682.68
PDV[31]L90.4381.8677.3683.2294.5690.4886.2390.42
VFF[32]L+I89.5082.0979.2983.62
PG-RCNN[23]I89.3882.1377.3382.8893.3989.4686.5489.80
PVT-SSD[24]I90.6582.2976.8583.2695.2391.6386.4391.10
DVF-PV[25]L+I90.9982.4077.3783.58
本文方法L91.6082.5377.8383.9995.5991.3786.7291.23
Tab.1 Comparison of detection result from different algorithm on KITTI test dataset
方法AP3D/%APBEV/%
简单中等困难简单中等困难
PV-RCNN[30]92.5784.4382.6995.7691.1188.93
Voxel-RCNN[9]92.3885.2982.8695.5291.2588.99
PDV[31]92.5685.2983.05
VFF[32]92.4785.6583.3895.6291.7591.39
CT3D[8]92.8585.8283.4696.1491.8889.63
本文方法93.2786.0083.5796.6692.1189.75
Tab.2 Comparison of detection result from different algorithm on KITTI validation dataset
方法AP/APH (LEVEL_1)AP/APH (LEVEL_2)
d = 0~30 md = 30~50 md > 50 m均值d = 0~30 md = 30~50 md > 50 m均值
SECOND[20]88.66/88.1867.35/66.7042.89/42.0970.07/69.5287.33/86.8660.92/60.2332.39/31.7761.63/61.14
PV-RCNN[30]91.30/90.5673.00/72.3151.35/50.3474.70/74.0989.75/89.2966.32/65.6839.27/38.4666.05/65.50
Voxel-RCNN[9]90.81/90.3672.43/71.7850.37/49.4773.90/73.3289.50/89.0565.68/65.0838.32/37.6165.10/64.58
本文方法91.20/90.7773.28/72.6852.40/51.4574.92/74.3889.91/89.4866.58/66.0240.04/39.2966.19/65.69
Tab.3 Comparison of detection results from different algorithms on Waymo validation dataset
实验LPESICCAFAP3D/%
简单中等困难
92.3885.2982.86
实验(a)92.8085.4883.21
实验(b)93.0485.8383.50
实验(c)93.2786.0083.57
Tab.4 Results of ablation experiments conducted on proposed model(KITTI)
Fig.4 Comparison of visualization results between KITTI and Waymo datasets
方法CONGRUSEACAFAP3D/%
84.52
方法184.92
方法285.33
方法385.50
方法485.77
Tab.5 Performance comparison of various fusion methods for local information fusion(KITTI)
模型v/(帧·s?1)C/GBAP3D/%
Voxel-RCNN[9]0.04122.7885.29
PV-RCNN[30]0.12889.2784.43
Point-RCNN[14]0.15327.7178.63
本文方法0.06726.6486.00
Tab.6 Comparison of performance and efficiency of proposed method and other model
[1]   MAO J, SHI S, WANG X, et al 3D object detection for autonomous driving: a comprehensive survey[J]. International Journal of Computer Vision, 2023, 131: 1909- 1963
doi: 10.1007/s11263-023-01790-1
[2]   MUHAMMAD K, HUSSAIN T, ULLAH H, et al Vision-based semantic segmentation in scene understanding for autonomous driving: recent achievements, challenges, and outlooks[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (12): 22694- 22715
doi: 10.1109/TITS.2022.3207665
[3]   BEHLEY J, GARBADE M, MILIOTO A, et al. Semantickitti: a dataset for semantic scene understanding of lidar sequences [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 9297-9307.
[4]   LIU Z, WU S, JIN S, et al Investigating pose representations and motion contexts modeling for 3D motion prediction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45 (1): 681- 697
[5]   AKSAN E, KAUFMANN M, CAO P, et al. A spatio-temporal transformer for 3d human motion prediction [C]// International Conference on 3D Vision . [S. l. ]: IEEE, 2021: 567-574.
[6]   CUI A, CASAS S, SADAT A, et al. Lookout: diverse multi-future prediction and planning for self-driving [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 16107-16116.
[7]   DERUYTTERE T, VANDENHENDE S, GRUJICIC D, et al. Talk2car: taking control of your self-driving car [C]//. Processing and the 9th International Joint Conference on Natural Language Processing , Hong Kong: ACL, 2019: 2088-2098.
[8]   SHENG H, CAI S, LIU Y, et al. Improving 3d object detection with channel-wise transformer [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 2743-2752.
[9]   DENG J, SHI S, LI P, et al. Voxel R-CNN: towards high performance voxel-based 3d object detection [C]// Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver: AAAI, 2021: 1201-1209.
[10]   GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the kitti vision benchmark suite [C]// IEEE Conference on Computer Vision and Pattern Recognition . Providence: IEEE, 2012: 3354-3361.
[11]   SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 2446-2454.
[12]   HUO Weile, JING Tao, REN Shuang Review of 3D object detection for autonomous driving[J]. Computer Science, 2023, 50 (7): 107- 118
[13]   QI C R, SU H, MO K, et al. Pointnet: deep learning on point sets for 3D classification and segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 652-660.
[14]   SHI S, WANG X, LI H. Pointrcnn: 3d object proposal generation and detection from point cloud [C]// Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 770-779.
[15]   QI C R, LITANY O, HE K, et al. Deep hough voting for 3d object detection in point clouds [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 9277-9286.
[16]   SHI W, RAJKUMAR R. Point-gnn: graph neural network for 3d object detection in a point cloud [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 1711-1719.
[17]   YANG Z, SUN Y, LIU S, et al. Std: sparse-to-dense 3d object detector for point cloud [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 1951-1960.
[18]   YANG Z, SUN Y, LIU S, et al. 3dssd: point-based 3d single stage object detector [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 11040-11048.
[19]   ZHOU Y, TUZEL O. Voxelnet: end-to-end learning for point cloud based 3d object detection [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 4490-4499.
[20]   YAN Y, MAO Y, LI B Second: sparsely embedded convolutional detection[J]. Sensors, 2018, 18 (10): 3337- 3353
doi: 10.3390/s18103337
[21]   MAO J, XUE Y, NIU M, et al. Voxel transformer for 3d object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 3164-3173.
[22]   XU Q, ZHOU Y, WANG W, et al. Spg: unsupervised domain adaptation for 3d object detection via semantic point generation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 15446-15456.
[23]   KOO I, LEE I, KIM S H, et al. PG-RCNN: semantic surface point generation for 3D object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Vancouver: IEEE, 2023: 18142-18151.
[24]   YANG H, WANG W, CHEN M, et al. PVT-SSD: single-stage 3D object detector with point-voxel Transformer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 13476-13487.
[25]   MAHMOUD A, HU J S, WASLANDER S L. Dense voxel fusion for 3D object detection [C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa: IEEE, 2023: 663-672.
[26]   VASWANI A, SHAZEER N, PARMAR N, et al Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30 (2): 6000- 6010
[27]   HE C, LI R, LI S, et al. Voxel set transformer: a set-to-set approach to 3d object detection from point clouds [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 8417-8427.
[28]   MAO J, NIU M, BAI H, et al. Pyramid R-CNN: towards better performance and adaptability for 3d object detection [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . [S. l. ]: IEEE, 2021: 2723-2732.
[29]   PAN X, XIA Z, SONG S, et al. 3d object detection with pointformer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . [S. l. ]: IEEE, 2021: 7463-7472.
[30]   SHI S, GUO C, JIANG L, et al. Pv-rcnn: point-voxel feature set abstraction for 3d object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . [S. l. ]: IEEE, 2020: 10529-10538.
[31]   HU J S, KUAI T, WASLANDER S L. Point density-aware voxels for lidar 3d object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 8469-8478.
[1] Canlin LI,Xinyue WANG,Lizhuang MA,Zhiwen SHAO,Wenjiao ZHANG. Image cartoonization incorporating attention mechanism and structural line extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1728-1737.
[2] Zhongliang LI,Qi CHEN,Lin SHI,Chao YANG,Xianming ZOU. Dynamic knowledge graph completion of temporal aware combination[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1738-1747.
[3] Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU. Attention-fused filter bank dual-view graph convolution motor imagery EEG classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1326-1335.
[4] Xianwei MA,Chaohui FAN,Weizhi NIE,Dong LI,Yiqun ZHU. Robust fault diagnosis method for failure sensors[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1488-1497.
[5] Jun YANG,Chen ZHANG. Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1121-1132.
[6] Yuntang LI,Hengjie LI,Kun ZHANG,Binrui WANG,Shanyue GUAN,Yuan CHEN. Recognition of complex power lines based on novel encoder-decoder network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1133-1141.
[7] Zhiwei XING,Shujie ZHU,Biao LI. Airline baggage feature perception based on improved graph convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 941-950.
[8] Yi LIU,Yidan CHEN,Lin GAO,Jiao HONG. Lightweight road extraction model based on multi-scale feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 951-959.
[9] Cuiting WEI,Weijian ZHAO,Bochao SUN,Yunyi LIU. Intelligent rebar inspection based on improved Mask R-CNN and stereo vision[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 1009-1019.
[10] Hai HUAN,Yu SHENG,Chenxi GU. Global guidance multi-feature fusion network based on remote sensing image road extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 696-707.
[11] Mingjun SONG,Wen YAN,Yizhao DENG,Junran ZHANG,Haiyan TU. Light-weight algorithm for real-time robotic grasp detection[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 599-610.
[12] Zhaoyi JIANG,Wenqin ZOU,Shenghao ZHENG,Chao SONG,Bailin YANG. Variable rate compression of point cloud based on scene flow[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 279-287.
[13] Xinhua YAO,Tao YU,Senwen FENG,Zijian MA,Congcong LUAN,Hongyao SHEN. Recognition method of parts machining features based on graph neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 349-359.
[14] Yali XUE,Lizun ZHOU,Linfei WANG,Quan OUYANG. Three-dimensional target inversion algorithm based on multi-feature reconstruction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(11): 2199-2207.
[15] Diao ZHOU,Xin XIONG,Jianhua ZHOU,Jing ZONG,Qi ZHANG. Convolutional neural network combined with subdomain adaptation for low sampling rate EMG-based gesture recognition[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(10): 2011-2019.