Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (6): 1121-1132    DOI: 10.3785/j.issn.1008-973X.2024.06.003
计算机技术     
基于边界点估计与稀疏卷积神经网络的三维点云语义分割
杨军1,2(),张琛1
1. 兰州交通大学 电子与信息工程学院,甘肃 兰州 730070
2. 兰州交通大学 测绘与地理信息学院,甘肃 兰州 730070
Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network
Jun YANG1,2(),Chen ZHANG1
1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2. Faculty of Geomatics, Lanzhou Jiaotong University, Lanzhou 730070, China
 全文: PDF(1828 KB)   HTML
摘要:

针对大规模点云具有稀疏性,传统点云方法提取上下文语义特征不够丰富,并且语义分割结果存在物体边界模糊的问题,提出基于边界点估计与稀疏卷积神经网络的三维点云语义分割算法,主要包括体素分支与点分支. 对于体素分支,将原始点云进行体素化后经过稀疏卷积得到上下文语义特征;进行解体素化得到每个点的初始语义标签;将初始语义标签输入到边界点估计模块中得到可能的边界点. 对于点分支,使用改进的动态图卷积模块提取点云局部几何特征;依次经过空间注意力模块与通道注意力模块增强局部特征;将点分支得到的局部几何特征与体素分支得到的上下文特征融合,增强点云特征的丰富性. 本算法在S3DIS数据集和SemanticKITTI数据集上的语义分割精度分别达到69.5%和62.7%. 实验结果表明,本研究算法能够提取到更丰富的点云特征,可以对物体的边界区域进行准确分割,具有较好的三维点云语义分割能力.

关键词: 点云数据语义分割注意力机制稀疏卷积体素化    
Abstract:

The large-scale point clouds are sparse, the traditional point cloud methods are insufficient in extracting rich contextual semantic features, and the semantic segmentation results have the problem of fuzzy object boundaries. A 3D point cloud semantic segmentation algorithm based on boundary point estimation and sparse convolution neural network was proposed, mainly including the voxel branch and the point branch. For the voxel branch, the original point cloud was voxelized, and then the contextual semantic features were obtained by sparse convolution. The initial semantic label of each point was obtained by voxelization. Finally, it was input into the boundary point estimation module to get the possible boundary points. For the point branch, the improved dynamic graph convolution module was first used to extract the local geometric features of the point cloud. Then, the local features were enhanced through the spatial attention module and the channel attention module in turn. Finally, the local geometric features obtained from the point branch and the contextual features obtained from the voxel branch were fused to enhance the richness of point cloud features. The semantic segmentation accuracy values of this algorithm on the S3DIS dataset and SemanticKITTI dataset were 69.5% and 62.7%, respectively. Experimental results show that the proposed algorithm can extract richer features of point clouds, accurately segment object boundary regions, and has good semantic segmentation ability for 3D point clouds.

Key words: point cloud data    semantic segmentation    attention mechanism    sparse convolution    voxelization
收稿日期: 2023-05-20 出版日期: 2024-05-25
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(42261067);兰州市人才创新创业资助项目(2020-RC-22);甘肃省教育厅优秀研究生“创新之星”资助项目(2022CXZX-613).
作者简介: 杨军(1973—),男,教授,博导,从事三维模型的空间分析、模式识别和遥感大数据智能解译研究. orcid.org/0000-0001-6403-3408. E-mail:yangj@mail.lzjtu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
杨军
张琛

引用本文:

杨军,张琛. 基于边界点估计与稀疏卷积神经网络的三维点云语义分割[J]. 浙江大学学报(工学版), 2024, 58(6): 1121-1132.

Jun YANG,Chen ZHANG. Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1121-1132.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.06.003        https://www.zjujournals.com/eng/CN/Y2024/V58/I6/1121

图 1  基于边界点估计与稀疏卷积神经网络的三维点云语义分割网络架构
图 2  基于稀疏卷积的U-Net模型结构
图 3  边界点估计模块结构示意图
方法OA/%mIoU/%IoU/%
ceilingfloorwallbeamcolumnwindowdoortablechairsofabookcaseboardclutter
PointNet [5]79.341.188.897.369.80.13.946.310.859.052.65.940.326.433.2
TangentConv [28]82.552.690.597.774.00.020.739.031.377.569.457.338.548.839.8
PointCNN [29]85.957.392.398.279.40.017.622.862.174.480.631.766.762.156.7
SPG [30]86.458.089.496.978.10.042.848.961.684.775.469.852.62.152.2
PointWeb [31]87.060.392.098.579.40.021.159.734.876.388.346.969.364.952.5
HPEIN [32]87.261.991.598.281.40.023.365.340.075.587.758.567.865.649.4
RandLA-Net [18]87.262.491.195.680.20.024.762.347.776.283.760.271.165.753.8
GACNet [33]87.862.892.398.381.90.020.359.140.878.585.861.770.774.752.8
PPCNN++ [34]64.094.098.583.70.018.666.161.779.488.049.570.166.456.1
BAAF-Net [35]88.965.492.997.982.30.023.165.564.978.587.561.470.768.757.2
KPConv [36]67.192.897.382.40.023.958.069.081.591.075.475.366.758.9
AGConv [37]90.067.993.998.482.20.023.959.171.391.581.275.574.972.158.6
本研究方法90.869.594.499.287.20.027.262.272.891.885.879.066.774.462.9
表 1  不同方法在S3DIS数据集上的分割精度对比(以Area 5作为测试)
图 4  S3DIS数据集分割结果的可视化
方法mIoU/%IoU/%
roadside-
walk
par-
king
other-groundbuil-
ding
cartruckbicy-
cle
motor-
cycle
other-vehiclevegeta-
tion
trunkterrainper-
son
bicy-
clist
motor-
cyclist
fencepoletraffic-sign
PointNet [5]14.661.635.715.81.441.446.30.11.30.30.831.04.617.60.20.20.012.92.43.7
SPG [30]17.445.028.51.60.664.349.30.10.20.20.848.927.224.60.32.70.120.815.90.8
PointNet++ [6]20.172.041.818.75.662.353.70.91.90.20.246.513.830.00.91.00.016.96.08.9
TangentConv [28]40.983.963.933.415.483.490.815.22.716.512.179.549.358.123.028.48.149.035.828.5
SpSequenceNet [38]43.190.173.957.627.191.288.529.224.00.022.784.066.065.76.30.00.067.750.848.7
HPGCNN [39]50.589.573.658.834.691.293.121.06.517.623.384.465.970.032.130.014.765.545.541.5
RangeNet++ [40]52.291.875.265.027.887.491.425.725.734.423.080.555.164.638.338.84.858.647.955.9
RandLA-Net [18]53.990.773.760.320.486.994.240.126.025.838.981.461.366.849.248.27.256.349.247.7
PolarNet [41]54.390.874.461.721.790.093.822.940.330.128.584.065.567.843.240.25.661.351.857.5
3D-MiniNet [42]55.891.674.564.225.489.490.528.542.342.129.482.860.866.747.844.114.560.848.056.6
SAFFGCNN [43]56.689.973.963.535.191.595.038.333.235.128.784.467.169.545.343.57.366.154.353.7
KPConv [36]58.888.872.761.331.690.596.033.430.242.531.684.869.269.161.561.611.864.256.448.4
BAAF-Net [35]59.990.974.462.223.689.895.448.731.835.546.782.763.467.949.555.753.060.853.752.0
TORNADONet [44]61.190.875.365.327.589.693.143.153.044.439.484.164.369.661.656.720.262.955.064.2
FusionNet [20]61.391.877.168.830.892.595.341.847.537.734.584.569.868.559.556.811.969.460.066.5
本研究方法62.792.778.571.631.591.495.540.946.148.042.285.268.470.263.954.323.868.656.762.8
表 2  不同方法在SemanticKITTI数据集上的分割精度对比
图 5  SemanticKITTI数据集分割结果的可视化
tmIoU/% tmIoU/%
0.164.50.668.2
0.265.90.768.0
0.367.40.867.9
0.469.50.967.7
0.568.61.067.6
表 3  边界点估计模块有效性验证(t间隔为0.1)
tmIoU/% tmIoU/%
0.3267.40.4269.2
0.3467.90.4469.1
0.3668.20.4668.9
0.3868.80.4868.8
0.4069.5
表 4  边界点估计模块的有效性验证(t间隔为0.02)
r/cmmIoU/% r/cmmIoU/%
168.3569.3
268.8669.2
369.1768.6
469.5868.1
表 5  不同体素分辨率对分割结果的影响
图 6  S3DIS数据集消融实验分割结果的可视化
1 SHI S, GUO C, JIANG L, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 10526−10535.
2 CHABRA R, LENSSEN J, ILG E, et al. Deep local shapes: learning local SDF priors for detailed 3D reconstruction [C]// Proceedings of the European Conference on Computer Vision . Glasgow: Springer, 2020: 608−625.
3 HU W, ZHAO H, JIANG L, et al. Bidirectional projection network for cross dimension scene understanding [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . [s. l.]: IEEE, 2021: 14373−14382.
4 DANG J S, YANG J LHPHGCNN: lightweight hierarchical parallel heterogeneous group convolutional neural networks for point cloud scene prediction[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (10): 18903- 18915
doi: 10.1109/TITS.2022.3167910
5 QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 77−85.
6 QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space [C]// Advances in Neural Information Processing Systems. Long Beach: MIT Press, 2017: 5099−5108.
7 LAWIN F J, DANELLJAN M, TOSTEBERG P, et al. Deep projective 3D semantic segmentation [C]// International Conference on Computer Analysis of Images and Patterns . Ystad: Springer, 2017: 95−107.
8 BOULCH A, GUERRY J, SAUX B, et al SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks[J]. Computer and Graphics, 2018, 71: 189- 198
doi: 10.1016/j.cag.2017.11.010
9 GUERRY J, BOULCH A, LE S, et al. SnapNet-R: consistent 3D multi-view semantic labeling for robotics [C]// Proceedings of the IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 669−678.
10 CORTINHAL T, TZELEPIS G, ERDAL E, et al. SalsaNext: fast, uncertainty-aware semantic segmentation of LiDAR point clouds [C]// International Symposium on Visual Computing . San Diego: Springer, 2020: 207−222.
11 ÇICEK O, ABDULKADIR A, LIENKAMP S S, et al. 3D U-Net: learning dense volumetric segmentation from sparse annotation [C]// Medical Image Computing and Computer-Assisted Intervention . Athens: Springer, 2016: 424−432.
12 WANG P S, LIU Y, GUO Y X, et al O-CNN: octree-based convolutional neural networks for 3D shape analysis[J]. ACM Transactions on Graphics, 2017, 36 (4): 1- 11
13 MENG H Y, GAO L, LAI Y K, et al. VV-Net: voxel VAE net with group convolutions for point cloud segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 8499−8507.
14 LE T, DUAN Y. PointGrid: a deep network for 3D shape understanding [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 9204−9214.
15 WANG Y, SUN Y, LIU Z, et al Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2018, 38 (5): 146- 158
16 KANG Z H, LI N PyramNet: point cloud pyramid attention network and graph embedding module for classification and segmentation[J]. Australian Journal of Intelligent Information Processing Systems, 2019, 16 (2): 35- 43
17 党吉圣, 杨军 多特征融合的三维模型识别与分割[J]. 西安电子科技大学学报, 2020, 47 (4): 149- 157
DANG Jisheng, YANG Jun 3D model recognition and segmentation based on multi-feature fusion[J]. Journal of Xidian University, 2020, 47 (4): 149- 157
18 HU Q Y, YANG B, XIE L H, et al. RandLA-Net: efficient semantic segmentation of large-scale point clouds [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 11105−11114.
19 LIU Z J, TANG H T, LIN Y J, et al. Point-voxel CNN for efficient 3D deep learning [C]// Advances in Neural Information Processing Systems . Vancouver: MIT Press, 2019: 963−973.
20 ZHANG F H, FANG J, WAH B, et al. Deep fusionnet for point cloud semantic segmentation [C]// Proceedings of the European Conference on Computer Vision . Glasgow: Springer, 2020: 644−663.
21 LIONG V E, NGUYEN T N T, Widjaja S, et al. AMVNet: assertion-based multi-view fusion network for LiDAR semantic segmentation [EB/OL]. (2020-12-09) [2023-02-12]. https://doi.org/10.48550/arXiv.2012.04934.
22 XU J Y, ZHANG R X, DOU J, et al. RPVNet: a deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 16004−16013.
23 RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation [C]// Medical Image Computing and Computer-Assisted Intervention . Munich: Springer, 2015: 234−241.
24 GRAHAM B, ENGELCKE M, MAATEN L. 3D semantic segmentation with submanifold sparse convolutional networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 9224−9232.
25 杨军, 张琛. 融合双注意力机制和动态图卷积神经网络的三维点云语义分割 [EB/OL]. (2023-01-10) [2023-02-12]. https://bhxb.buaa.edu.cn/bhzk/article/doi/10.13700/j.bh.1001-5965.2022.0775.
26 ARMENI I, SENER O, ZAMIR A, et al. 3D semantic parsing of large-scale indoor spaces [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 1534−1543.
27 BEHLEY J, GARBADE M, MILIOTO A, et al. SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 9296−9306.
28 TATARCHENKO M, PARK J, KOLTUN V, et al. Tangent convolutions for dense prediction in 3D [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 3887−3896.
29 LI Y, BU R, SUN M, et al. PointCNN: convolution on x-transformed points [C]// Advances in Neural Information Processing Systems . Montréal: MIT Press, 2018: 828−838.
30 LANDRIEU L, SIMONOVSKY M. Large-scale point cloud semantic segmentation with superpoint graphs [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 4558−4567.
31 ZHAO H, JIANG L, FU C W, et al. PointWeb: enhancing local neighborhood features for point cloud processing [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 5565−5573.
32 JIANG L, ZHAO H S, LIU S, et al. Hierarchical point-edge interaction network for point cloud semantic segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 10432−10440.
33 WANG L, HUANG Y, HOU Y, et al. Graph attention convolution for point cloud semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 10296−10305.
34 AHN P, YANG J, YI E, et al Projection-based point convolution for efficient point cloud segmentation[J]. IEEE Access, 2022, 10: 15348- 15358
doi: 10.1109/ACCESS.2022.3144449
35 SHI Q, SAEED A, NICK B. Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . [s. l.]: IEEE, 2021: 1757−1767.
36 THOMAS H, QI C R, DESCHAUD J E, et al. KPConv: flexible and deformable convolution for point clouds [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 6410−6419.
37 WEI M, WEI Z, ZHOU H, et al AGConv: adaptive graph convolution on 3D point clouds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (8): 9374- 9392
38 SHI H Y, LIN G S, WANG H, et al. SpSequenceNet: semantic segmentation network on 4D point clouds [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 4573–4582.
39 DANG J S, YANG J. HPGCNN: hierarchical parallel group convolutional neural networks for point clouds processing [C]// Proceedings of the Asian Conference on Computer Vision . Kyoto: Springer, 2020: 20−37.
40 MILIOTO A, VIZZO Ⅰ, BEHLEY J, et al. RangeNet++: fast and accurate LiDAR semantic segmentation [C]// IEEE/RSJ International Conference on Intelligent Robots and Systems . Macau: IEEE, 2019: 4213−4220.
41 ZHANG Y, ZHOU Z, DAIID P, et al. PolarNet: an improved grid representation for online LiDAR point clouds semantic segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 9598−9607.
42 ALONSO I, RIAZUELO L, MONTESANO L, et al 3D-MiniNet: learning a 2D representation from point clouds for fast and efficient 3D LiDAR semantic segmentation[J]. IEEE Robotics and Automation Letters, 2020, 5 (4): 5432- 5439
doi: 10.1109/LRA.2020.3007440
43 杨军, 李博赞 基于自注意力特征融合组卷积神经网络的三维点云语义分割[J]. 光学精密工程, 2022, 30 (7): 840- 853
YANG Jun, LI Bozan Semantic segmentation of 3D point cloud based on self-attention feature fusion group convolutional neural network[J]. Optics and Precision Engineering, 2022, 30 (7): 840- 853
doi: 10.37188/OPE.20223007.0840
[1] 邢志伟,朱书杰,李彪. 基于改进图卷积神经网络的航空行李特征感知[J]. 浙江大学学报(工学版), 2024, 58(5): 941-950.
[2] 刘毅,陈一丹,高琳,洪姣. 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报(工学版), 2024, 58(5): 951-959.
[3] 魏翠婷,赵唯坚,孙博超,刘芸怡. 基于改进Mask R-CNN与双目视觉的智能配筋检测[J]. 浙江大学学报(工学版), 2024, 58(5): 1009-1019.
[4] 范康,钟铭恩,谭佳威,詹泽辉,冯妍. 联合语义分割和深度估计的交通场景感知算法[J]. 浙江大学学报(工学版), 2024, 58(4): 684-695.
[5] 宦海,盛宇,顾晨曦. 基于遥感图像道路提取的全局指导多特征融合网络[J]. 浙江大学学报(工学版), 2024, 58(4): 696-707.
[6] 宋明俊,严文,邓益昭,张俊然,涂海燕. 轻量化机器人抓取位姿实时检测算法[J]. 浙江大学学报(工学版), 2024, 58(3): 599-610.
[7] 李灿林,张文娇,邵志文,马利庄,王新玥. 基于Trans-nightSeg的夜间道路场景语义分割方法[J]. 浙江大学学报(工学版), 2024, 58(2): 294-303.
[8] 姚鑫骅,于涛,封森文,马梓健,栾丛丛,沈洪垚. 基于图神经网络的零件机加工特征识别方法[J]. 浙江大学学报(工学版), 2024, 58(2): 349-359.
[9] 秦思怡,盖绍彦,达飞鹏. 混合采样下多级特征聚合的视频目标检测算法[J]. 浙江大学学报(工学版), 2024, 58(1): 10-19.
[10] 冯志成,杨杰,陈智超. 基于轻量级Transformer的城市路网提取方法[J]. 浙江大学学报(工学版), 2024, 58(1): 40-49.
[11] 李海烽,张雪英,段淑斐,贾海蓉,Huizhi Liang . 融合生成对抗网络与时间卷积网络的普通话情感识别[J]. 浙江大学学报(工学版), 2023, 57(9): 1865-1875.
[12] 赵小强,王泽,宋昭漾,蒋红梅. 基于动态注意力网络的图像超分辨率重建[J]. 浙江大学学报(工学版), 2023, 57(8): 1487-1494.
[13] 王慧欣,童向荣. 融合知识图谱的推荐系统研究进展[J]. 浙江大学学报(工学版), 2023, 57(8): 1527-1540.
[14] 宋秀兰,董兆航,单杭冠,陆炜杰. 基于时空融合的多头注意力车辆轨迹预测[J]. 浙江大学学报(工学版), 2023, 57(8): 1636-1643.
[15] 郭浩然,郭继昌,汪昱东. 面向水下场景的轻量级图像语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(7): 1278-1286.