Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (6): 1121-1132    DOI: 10.3785/j.issn.1008-973X.2024.06.003
    
Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network
Jun YANG1,2(),Chen ZHANG1
1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2. Faculty of Geomatics, Lanzhou Jiaotong University, Lanzhou 730070, China
Download: HTML     PDF(1828KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

The large-scale point clouds are sparse, the traditional point cloud methods are insufficient in extracting rich contextual semantic features, and the semantic segmentation results have the problem of fuzzy object boundaries. A 3D point cloud semantic segmentation algorithm based on boundary point estimation and sparse convolution neural network was proposed, mainly including the voxel branch and the point branch. For the voxel branch, the original point cloud was voxelized, and then the contextual semantic features were obtained by sparse convolution. The initial semantic label of each point was obtained by voxelization. Finally, it was input into the boundary point estimation module to get the possible boundary points. For the point branch, the improved dynamic graph convolution module was first used to extract the local geometric features of the point cloud. Then, the local features were enhanced through the spatial attention module and the channel attention module in turn. Finally, the local geometric features obtained from the point branch and the contextual features obtained from the voxel branch were fused to enhance the richness of point cloud features. The semantic segmentation accuracy values of this algorithm on the S3DIS dataset and SemanticKITTI dataset were 69.5% and 62.7%, respectively. Experimental results show that the proposed algorithm can extract richer features of point clouds, accurately segment object boundary regions, and has good semantic segmentation ability for 3D point clouds.



Key wordspoint cloud data      semantic segmentation      attention mechanism      sparse convolution      voxelization     
Received: 20 May 2023      Published: 25 May 2024
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(42261067);兰州市人才创新创业资助项目(2020-RC-22);甘肃省教育厅优秀研究生“创新之星”资助项目(2022CXZX-613).
Cite this article:

Jun YANG,Chen ZHANG. Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1121-1132.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.06.003     OR     https://www.zjujournals.com/eng/Y2024/V58/I6/1121


基于边界点估计与稀疏卷积神经网络的三维点云语义分割

针对大规模点云具有稀疏性,传统点云方法提取上下文语义特征不够丰富,并且语义分割结果存在物体边界模糊的问题,提出基于边界点估计与稀疏卷积神经网络的三维点云语义分割算法,主要包括体素分支与点分支. 对于体素分支,将原始点云进行体素化后经过稀疏卷积得到上下文语义特征;进行解体素化得到每个点的初始语义标签;将初始语义标签输入到边界点估计模块中得到可能的边界点. 对于点分支,使用改进的动态图卷积模块提取点云局部几何特征;依次经过空间注意力模块与通道注意力模块增强局部特征;将点分支得到的局部几何特征与体素分支得到的上下文特征融合,增强点云特征的丰富性. 本算法在S3DIS数据集和SemanticKITTI数据集上的语义分割精度分别达到69.5%和62.7%. 实验结果表明,本研究算法能够提取到更丰富的点云特征,可以对物体的边界区域进行准确分割,具有较好的三维点云语义分割能力.


关键词: 点云数据,  语义分割,  注意力机制,  稀疏卷积,  体素化 
Fig.1 Architecture of 3D point cloud semantic segmentation network based on boundary point estimation and sparse convolution neural network
Fig.2 Structure of U-Net based on sparse convolution
Fig.3 Structure of boundary point estimation module
方法OA/%mIoU/%IoU/%
ceilingfloorwallbeamcolumnwindowdoortablechairsofabookcaseboardclutter
PointNet [5]79.341.188.897.369.80.13.946.310.859.052.65.940.326.433.2
TangentConv [28]82.552.690.597.774.00.020.739.031.377.569.457.338.548.839.8
PointCNN [29]85.957.392.398.279.40.017.622.862.174.480.631.766.762.156.7
SPG [30]86.458.089.496.978.10.042.848.961.684.775.469.852.62.152.2
PointWeb [31]87.060.392.098.579.40.021.159.734.876.388.346.969.364.952.5
HPEIN [32]87.261.991.598.281.40.023.365.340.075.587.758.567.865.649.4
RandLA-Net [18]87.262.491.195.680.20.024.762.347.776.283.760.271.165.753.8
GACNet [33]87.862.892.398.381.90.020.359.140.878.585.861.770.774.752.8
PPCNN++ [34]64.094.098.583.70.018.666.161.779.488.049.570.166.456.1
BAAF-Net [35]88.965.492.997.982.30.023.165.564.978.587.561.470.768.757.2
KPConv [36]67.192.897.382.40.023.958.069.081.591.075.475.366.758.9
AGConv [37]90.067.993.998.482.20.023.959.171.391.581.275.574.972.158.6
本研究方法90.869.594.499.287.20.027.262.272.891.885.879.066.774.462.9
Tab.1 Comparison of segmentation accuracy of different methods on S3DIS dataset (Area 5 as a test)
Fig.4 Visualization of segmentation results of S3DIS dataset
方法mIoU/%IoU/%
roadside-
walk
par-
king
other-groundbuil-
ding
cartruckbicy-
cle
motor-
cycle
other-vehiclevegeta-
tion
trunkterrainper-
son
bicy-
clist
motor-
cyclist
fencepoletraffic-sign
PointNet [5]14.661.635.715.81.441.446.30.11.30.30.831.04.617.60.20.20.012.92.43.7
SPG [30]17.445.028.51.60.664.349.30.10.20.20.848.927.224.60.32.70.120.815.90.8
PointNet++ [6]20.172.041.818.75.662.353.70.91.90.20.246.513.830.00.91.00.016.96.08.9
TangentConv [28]40.983.963.933.415.483.490.815.22.716.512.179.549.358.123.028.48.149.035.828.5
SpSequenceNet [38]43.190.173.957.627.191.288.529.224.00.022.784.066.065.76.30.00.067.750.848.7
HPGCNN [39]50.589.573.658.834.691.293.121.06.517.623.384.465.970.032.130.014.765.545.541.5
RangeNet++ [40]52.291.875.265.027.887.491.425.725.734.423.080.555.164.638.338.84.858.647.955.9
RandLA-Net [18]53.990.773.760.320.486.994.240.126.025.838.981.461.366.849.248.27.256.349.247.7
PolarNet [41]54.390.874.461.721.790.093.822.940.330.128.584.065.567.843.240.25.661.351.857.5
3D-MiniNet [42]55.891.674.564.225.489.490.528.542.342.129.482.860.866.747.844.114.560.848.056.6
SAFFGCNN [43]56.689.973.963.535.191.595.038.333.235.128.784.467.169.545.343.57.366.154.353.7
KPConv [36]58.888.872.761.331.690.596.033.430.242.531.684.869.269.161.561.611.864.256.448.4
BAAF-Net [35]59.990.974.462.223.689.895.448.731.835.546.782.763.467.949.555.753.060.853.752.0
TORNADONet [44]61.190.875.365.327.589.693.143.153.044.439.484.164.369.661.656.720.262.955.064.2
FusionNet [20]61.391.877.168.830.892.595.341.847.537.734.584.569.868.559.556.811.969.460.066.5
本研究方法62.792.778.571.631.591.495.540.946.148.042.285.268.470.263.954.323.868.656.762.8
Tab.2 Comparison of segmentation accuracy of different methods on SemanticKITTI dataset
Fig.5 Visualization of segmentation results of SemanticKITTI dataset
tmIoU/% tmIoU/%
0.164.50.668.2
0.265.90.768.0
0.367.40.867.9
0.469.50.967.7
0.568.61.067.6
Tab.3 Effectiveness verification of boundary point estimation module with t interval of 0.1
tmIoU/% tmIoU/%
0.3267.40.4269.2
0.3467.90.4469.1
0.3668.20.4668.9
0.3868.80.4868.8
0.4069.5
Tab.4 Effectiveness verification of boundary point estimation module with t interval of 0.02
r/cmmIoU/% r/cmmIoU/%
168.3569.3
268.8669.2
369.1768.6
469.5868.1
Tab.5 Influence of different voxel resolutions on segmentation results
Fig.6 Visualization of segmentation results of ablation experiments on S3DIS dataset
[1]   SHI S, GUO C, JIANG L, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 10526−10535.
[2]   CHABRA R, LENSSEN J, ILG E, et al. Deep local shapes: learning local SDF priors for detailed 3D reconstruction [C]// Proceedings of the European Conference on Computer Vision . Glasgow: Springer, 2020: 608−625.
[3]   HU W, ZHAO H, JIANG L, et al. Bidirectional projection network for cross dimension scene understanding [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . [s. l.]: IEEE, 2021: 14373−14382.
[4]   DANG J S, YANG J LHPHGCNN: lightweight hierarchical parallel heterogeneous group convolutional neural networks for point cloud scene prediction[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (10): 18903- 18915
doi: 10.1109/TITS.2022.3167910
[5]   QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 77−85.
[6]   QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space [C]// Advances in Neural Information Processing Systems. Long Beach: MIT Press, 2017: 5099−5108.
[7]   LAWIN F J, DANELLJAN M, TOSTEBERG P, et al. Deep projective 3D semantic segmentation [C]// International Conference on Computer Analysis of Images and Patterns . Ystad: Springer, 2017: 95−107.
[8]   BOULCH A, GUERRY J, SAUX B, et al SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks[J]. Computer and Graphics, 2018, 71: 189- 198
doi: 10.1016/j.cag.2017.11.010
[9]   GUERRY J, BOULCH A, LE S, et al. SnapNet-R: consistent 3D multi-view semantic labeling for robotics [C]// Proceedings of the IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 669−678.
[10]   CORTINHAL T, TZELEPIS G, ERDAL E, et al. SalsaNext: fast, uncertainty-aware semantic segmentation of LiDAR point clouds [C]// International Symposium on Visual Computing . San Diego: Springer, 2020: 207−222.
[11]   ÇICEK O, ABDULKADIR A, LIENKAMP S S, et al. 3D U-Net: learning dense volumetric segmentation from sparse annotation [C]// Medical Image Computing and Computer-Assisted Intervention . Athens: Springer, 2016: 424−432.
[12]   WANG P S, LIU Y, GUO Y X, et al O-CNN: octree-based convolutional neural networks for 3D shape analysis[J]. ACM Transactions on Graphics, 2017, 36 (4): 1- 11
[13]   MENG H Y, GAO L, LAI Y K, et al. VV-Net: voxel VAE net with group convolutions for point cloud segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 8499−8507.
[14]   LE T, DUAN Y. PointGrid: a deep network for 3D shape understanding [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 9204−9214.
[15]   WANG Y, SUN Y, LIU Z, et al Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2018, 38 (5): 146- 158
[16]   KANG Z H, LI N PyramNet: point cloud pyramid attention network and graph embedding module for classification and segmentation[J]. Australian Journal of Intelligent Information Processing Systems, 2019, 16 (2): 35- 43
[17]   党吉圣, 杨军 多特征融合的三维模型识别与分割[J]. 西安电子科技大学学报, 2020, 47 (4): 149- 157
DANG Jisheng, YANG Jun 3D model recognition and segmentation based on multi-feature fusion[J]. Journal of Xidian University, 2020, 47 (4): 149- 157
[18]   HU Q Y, YANG B, XIE L H, et al. RandLA-Net: efficient semantic segmentation of large-scale point clouds [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 11105−11114.
[19]   LIU Z J, TANG H T, LIN Y J, et al. Point-voxel CNN for efficient 3D deep learning [C]// Advances in Neural Information Processing Systems . Vancouver: MIT Press, 2019: 963−973.
[20]   ZHANG F H, FANG J, WAH B, et al. Deep fusionnet for point cloud semantic segmentation [C]// Proceedings of the European Conference on Computer Vision . Glasgow: Springer, 2020: 644−663.
[21]   LIONG V E, NGUYEN T N T, Widjaja S, et al. AMVNet: assertion-based multi-view fusion network for LiDAR semantic segmentation [EB/OL]. (2020-12-09) [2023-02-12]. https://doi.org/10.48550/arXiv.2012.04934.
[22]   XU J Y, ZHANG R X, DOU J, et al. RPVNet: a deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 16004−16013.
[23]   RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation [C]// Medical Image Computing and Computer-Assisted Intervention . Munich: Springer, 2015: 234−241.
[24]   GRAHAM B, ENGELCKE M, MAATEN L. 3D semantic segmentation with submanifold sparse convolutional networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 9224−9232.
[25]   杨军, 张琛. 融合双注意力机制和动态图卷积神经网络的三维点云语义分割 [EB/OL]. (2023-01-10) [2023-02-12]. https://bhxb.buaa.edu.cn/bhzk/article/doi/10.13700/j.bh.1001-5965.2022.0775.
[26]   ARMENI I, SENER O, ZAMIR A, et al. 3D semantic parsing of large-scale indoor spaces [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 1534−1543.
[27]   BEHLEY J, GARBADE M, MILIOTO A, et al. SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 9296−9306.
[28]   TATARCHENKO M, PARK J, KOLTUN V, et al. Tangent convolutions for dense prediction in 3D [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 3887−3896.
[29]   LI Y, BU R, SUN M, et al. PointCNN: convolution on x-transformed points [C]// Advances in Neural Information Processing Systems . Montréal: MIT Press, 2018: 828−838.
[30]   LANDRIEU L, SIMONOVSKY M. Large-scale point cloud semantic segmentation with superpoint graphs [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 4558−4567.
[31]   ZHAO H, JIANG L, FU C W, et al. PointWeb: enhancing local neighborhood features for point cloud processing [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 5565−5573.
[32]   JIANG L, ZHAO H S, LIU S, et al. Hierarchical point-edge interaction network for point cloud semantic segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 10432−10440.
[33]   WANG L, HUANG Y, HOU Y, et al. Graph attention convolution for point cloud semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 10296−10305.
[34]   AHN P, YANG J, YI E, et al Projection-based point convolution for efficient point cloud segmentation[J]. IEEE Access, 2022, 10: 15348- 15358
doi: 10.1109/ACCESS.2022.3144449
[35]   SHI Q, SAEED A, NICK B. Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . [s. l.]: IEEE, 2021: 1757−1767.
[36]   THOMAS H, QI C R, DESCHAUD J E, et al. KPConv: flexible and deformable convolution for point clouds [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 6410−6419.
[37]   WEI M, WEI Z, ZHOU H, et al AGConv: adaptive graph convolution on 3D point clouds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (8): 9374- 9392
[38]   SHI H Y, LIN G S, WANG H, et al. SpSequenceNet: semantic segmentation network on 4D point clouds [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 4573–4582.
[39]   DANG J S, YANG J. HPGCNN: hierarchical parallel group convolutional neural networks for point clouds processing [C]// Proceedings of the Asian Conference on Computer Vision . Kyoto: Springer, 2020: 20−37.
[40]   MILIOTO A, VIZZO Ⅰ, BEHLEY J, et al. RangeNet++: fast and accurate LiDAR semantic segmentation [C]// IEEE/RSJ International Conference on Intelligent Robots and Systems . Macau: IEEE, 2019: 4213−4220.
[41]   ZHANG Y, ZHOU Z, DAIID P, et al. PolarNet: an improved grid representation for online LiDAR point clouds semantic segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 9598−9607.
[42]   ALONSO I, RIAZUELO L, MONTESANO L, et al 3D-MiniNet: learning a 2D representation from point clouds for fast and efficient 3D LiDAR semantic segmentation[J]. IEEE Robotics and Automation Letters, 2020, 5 (4): 5432- 5439
doi: 10.1109/LRA.2020.3007440
[43]   杨军, 李博赞 基于自注意力特征融合组卷积神经网络的三维点云语义分割[J]. 光学精密工程, 2022, 30 (7): 840- 853
YANG Jun, LI Bozan Semantic segmentation of 3D point cloud based on self-attention feature fusion group convolutional neural network[J]. Optics and Precision Engineering, 2022, 30 (7): 840- 853
doi: 10.37188/OPE.20223007.0840
[1] Zhiwei XING,Shujie ZHU,Biao LI. Airline baggage feature perception based on improved graph convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 941-950.
[2] Yi LIU,Yidan CHEN,Lin GAO,Jiao HONG. Lightweight road extraction model based on multi-scale feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 951-959.
[3] Cuiting WEI,Weijian ZHAO,Bochao SUN,Yunyi LIU. Intelligent rebar inspection based on improved Mask R-CNN and stereo vision[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 1009-1019.
[4] Kang FAN,Ming’en ZHONG,Jiawei TAN,Zehui ZHAN,Yan FENG. Traffic scene perception algorithm with joint semantic segmentation and depth estimation[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 684-695.
[5] Hai HUAN,Yu SHENG,Chenxi GU. Global guidance multi-feature fusion network based on remote sensing image road extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 696-707.
[6] Mingjun SONG,Wen YAN,Yizhao DENG,Junran ZHANG,Haiyan TU. Light-weight algorithm for real-time robotic grasp detection[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 599-610.
[7] Canlin LI,Wenjiao ZHANG,Zhiwen SHAO,Lizhuang MA,Xinyue WANG. Semantic segmentation method on nighttime road scene based on Trans-nightSeg[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 294-303.
[8] Xinhua YAO,Tao YU,Senwen FENG,Zijian MA,Congcong LUAN,Hongyao SHEN. Recognition method of parts machining features based on graph neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 349-359.
[9] Siyi QIN,Shaoyan GAI,Feipeng DA. Video object detection algorithm based on multi-level feature aggregation under mixed sampler[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 10-19.
[10] Zhicheng FENG,Jie YANG,Zhichao CHEN. Urban road network extraction method based on lightweight Transformer[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 40-49.
[11] Hai-feng LI,Xue-ying ZHANG,Shu-fei DUAN,Hai-rong JIA,Hui-zhi LIANG. Fusing generative adversarial network and temporal convolutional network for Mandarin emotion recognition[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1865-1875.
[12] Xiao-qiang ZHAO,Ze WANG,Zhao-yang SONG,Hong-mei JIANG. Image super-resolution reconstruction based on dynamic attention network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1487-1494.
[13] Hui-xin WANG,Xiang-rong TONG. Research progress of recommendation system based on knowledge graph[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1527-1540.
[14] Xiu-lan SONG,Zhao-hang DONG,Hang-guan SHAN,Wei-jie LU. Vehicle trajectory prediction based on temporal-spatial multi-head attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1636-1643.
[15] Hao-ran GUO,Ji-chang GUO,Yu-dong WANG. Lightweight semantic segmentation network for underwater image[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1278-1286.