Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network
Jun YANG1,2(),Chen ZHANG1
1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China 2. Faculty of Geomatics, Lanzhou Jiaotong University, Lanzhou 730070, China
The large-scale point clouds are sparse, the traditional point cloud methods are insufficient in extracting rich contextual semantic features, and the semantic segmentation results have the problem of fuzzy object boundaries. A 3D point cloud semantic segmentation algorithm based on boundary point estimation and sparse convolution neural network was proposed, mainly including the voxel branch and the point branch. For the voxel branch, the original point cloud was voxelized, and then the contextual semantic features were obtained by sparse convolution. The initial semantic label of each point was obtained by voxelization. Finally, it was input into the boundary point estimation module to get the possible boundary points. For the point branch, the improved dynamic graph convolution module was first used to extract the local geometric features of the point cloud. Then, the local features were enhanced through the spatial attention module and the channel attention module in turn. Finally, the local geometric features obtained from the point branch and the contextual features obtained from the voxel branch were fused to enhance the richness of point cloud features. The semantic segmentation accuracy values of this algorithm on the S3DIS dataset and SemanticKITTI dataset were 69.5% and 62.7%, respectively. Experimental results show that the proposed algorithm can extract richer features of point clouds, accurately segment object boundary regions, and has good semantic segmentation ability for 3D point clouds.
Jun YANG,Chen ZHANG. Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1121-1132.
Fig.1Architecture of 3D point cloud semantic segmentation network based on boundary point estimation and sparse convolution neural network
Fig.2Structure of U-Net based on sparse convolution
Fig.3Structure of boundary point estimation module
方法
OA/%
mIoU/%
IoU/%
ceiling
floor
wall
beam
column
window
door
table
chair
sofa
bookcase
board
clutter
PointNet [5]
79.3
41.1
88.8
97.3
69.8
0.1
3.9
46.3
10.8
59.0
52.6
5.9
40.3
26.4
33.2
TangentConv [28]
82.5
52.6
90.5
97.7
74.0
0.0
20.7
39.0
31.3
77.5
69.4
57.3
38.5
48.8
39.8
PointCNN [29]
85.9
57.3
92.3
98.2
79.4
0.0
17.6
22.8
62.1
74.4
80.6
31.7
66.7
62.1
56.7
SPG [30]
86.4
58.0
89.4
96.9
78.1
0.0
42.8
48.9
61.6
84.7
75.4
69.8
52.6
2.1
52.2
PointWeb [31]
87.0
60.3
92.0
98.5
79.4
0.0
21.1
59.7
34.8
76.3
88.3
46.9
69.3
64.9
52.5
HPEIN [32]
87.2
61.9
91.5
98.2
81.4
0.0
23.3
65.3
40.0
75.5
87.7
58.5
67.8
65.6
49.4
RandLA-Net [18]
87.2
62.4
91.1
95.6
80.2
0.0
24.7
62.3
47.7
76.2
83.7
60.2
71.1
65.7
53.8
GACNet [33]
87.8
62.8
92.3
98.3
81.9
0.0
20.3
59.1
40.8
78.5
85.8
61.7
70.7
74.7
52.8
PPCNN++ [34]
—
64.0
94.0
98.5
83.7
0.0
18.6
66.1
61.7
79.4
88.0
49.5
70.1
66.4
56.1
BAAF-Net [35]
88.9
65.4
92.9
97.9
82.3
0.0
23.1
65.5
64.9
78.5
87.5
61.4
70.7
68.7
57.2
KPConv [36]
—
67.1
92.8
97.3
82.4
0.0
23.9
58.0
69.0
81.5
91.0
75.4
75.3
66.7
58.9
AGConv [37]
90.0
67.9
93.9
98.4
82.2
0.0
23.9
59.1
71.3
91.5
81.2
75.5
74.9
72.1
58.6
本研究方法
90.8
69.5
94.4
99.2
87.2
0.0
27.2
62.2
72.8
91.8
85.8
79.0
66.7
74.4
62.9
Tab.1Comparison of segmentation accuracy of different methods on S3DIS dataset (Area 5 as a test)
Fig.4Visualization of segmentation results of S3DIS dataset
方法
mIoU/%
IoU/%
road
side- walk
par- king
other-ground
buil- ding
car
truck
bicy- cle
motor- cycle
other-vehicle
vegeta- tion
trunk
terrain
per- son
bicy- clist
motor- cyclist
fence
pole
traffic-sign
PointNet [5]
14.6
61.6
35.7
15.8
1.4
41.4
46.3
0.1
1.3
0.3
0.8
31.0
4.6
17.6
0.2
0.2
0.0
12.9
2.4
3.7
SPG [30]
17.4
45.0
28.5
1.6
0.6
64.3
49.3
0.1
0.2
0.2
0.8
48.9
27.2
24.6
0.3
2.7
0.1
20.8
15.9
0.8
PointNet++ [6]
20.1
72.0
41.8
18.7
5.6
62.3
53.7
0.9
1.9
0.2
0.2
46.5
13.8
30.0
0.9
1.0
0.0
16.9
6.0
8.9
TangentConv [28]
40.9
83.9
63.9
33.4
15.4
83.4
90.8
15.2
2.7
16.5
12.1
79.5
49.3
58.1
23.0
28.4
8.1
49.0
35.8
28.5
SpSequenceNet [38]
43.1
90.1
73.9
57.6
27.1
91.2
88.5
29.2
24.0
0.0
22.7
84.0
66.0
65.7
6.3
0.0
0.0
67.7
50.8
48.7
HPGCNN [39]
50.5
89.5
73.6
58.8
34.6
91.2
93.1
21.0
6.5
17.6
23.3
84.4
65.9
70.0
32.1
30.0
14.7
65.5
45.5
41.5
RangeNet++ [40]
52.2
91.8
75.2
65.0
27.8
87.4
91.4
25.7
25.7
34.4
23.0
80.5
55.1
64.6
38.3
38.8
4.8
58.6
47.9
55.9
RandLA-Net [18]
53.9
90.7
73.7
60.3
20.4
86.9
94.2
40.1
26.0
25.8
38.9
81.4
61.3
66.8
49.2
48.2
7.2
56.3
49.2
47.7
PolarNet [41]
54.3
90.8
74.4
61.7
21.7
90.0
93.8
22.9
40.3
30.1
28.5
84.0
65.5
67.8
43.2
40.2
5.6
61.3
51.8
57.5
3D-MiniNet [42]
55.8
91.6
74.5
64.2
25.4
89.4
90.5
28.5
42.3
42.1
29.4
82.8
60.8
66.7
47.8
44.1
14.5
60.8
48.0
56.6
SAFFGCNN [43]
56.6
89.9
73.9
63.5
35.1
91.5
95.0
38.3
33.2
35.1
28.7
84.4
67.1
69.5
45.3
43.5
7.3
66.1
54.3
53.7
KPConv [36]
58.8
88.8
72.7
61.3
31.6
90.5
96.0
33.4
30.2
42.5
31.6
84.8
69.2
69.1
61.5
61.6
11.8
64.2
56.4
48.4
BAAF-Net [35]
59.9
90.9
74.4
62.2
23.6
89.8
95.4
48.7
31.8
35.5
46.7
82.7
63.4
67.9
49.5
55.7
53.0
60.8
53.7
52.0
TORNADONet [44]
61.1
90.8
75.3
65.3
27.5
89.6
93.1
43.1
53.0
44.4
39.4
84.1
64.3
69.6
61.6
56.7
20.2
62.9
55.0
64.2
FusionNet [20]
61.3
91.8
77.1
68.8
30.8
92.5
95.3
41.8
47.5
37.7
34.5
84.5
69.8
68.5
59.5
56.8
11.9
69.4
60.0
66.5
本研究方法
62.7
92.7
78.5
71.6
31.5
91.4
95.5
40.9
46.1
48.0
42.2
85.2
68.4
70.2
63.9
54.3
23.8
68.6
56.7
62.8
Tab.2Comparison of segmentation accuracy of different methods on SemanticKITTI dataset
Fig.5Visualization of segmentation results of SemanticKITTI dataset
t
mIoU/%
t
mIoU/%
0.1
64.5
0.6
68.2
0.2
65.9
0.7
68.0
0.3
67.4
0.8
67.9
0.4
69.5
0.9
67.7
0.5
68.6
1.0
67.6
Tab.3Effectiveness verification of boundary point estimation module with t interval of 0.1
t
mIoU/%
t
mIoU/%
0.32
67.4
0.42
69.2
0.34
67.9
0.44
69.1
0.36
68.2
0.46
68.9
0.38
68.8
0.48
68.8
0.40
69.5
—
—
Tab.4Effectiveness verification of boundary point estimation module with t interval of 0.02
r/cm
mIoU/%
r/cm
mIoU/%
1
68.3
5
69.3
2
68.8
6
69.2
3
69.1
7
68.6
4
69.5
8
68.1
Tab.5Influence of different voxel resolutions on segmentation results
Fig.6Visualization of segmentation results of ablation experiments on S3DIS dataset
[1]
SHI S, GUO C, JIANG L, et al. PV-RCNN: point-voxel feature set abstraction for 3D object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 10526−10535.
[2]
CHABRA R, LENSSEN J, ILG E, et al. Deep local shapes: learning local SDF priors for detailed 3D reconstruction [C]// Proceedings of the European Conference on Computer Vision . Glasgow: Springer, 2020: 608−625.
[3]
HU W, ZHAO H, JIANG L, et al. Bidirectional projection network for cross dimension scene understanding [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . [s. l.]: IEEE, 2021: 14373−14382.
[4]
DANG J S, YANG J LHPHGCNN: lightweight hierarchical parallel heterogeneous group convolutional neural networks for point cloud scene prediction[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (10): 18903- 18915
doi: 10.1109/TITS.2022.3167910
[5]
QI C R, SU H, MO K, et al. PointNet: deep learning on point sets for 3D classification and segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu: IEEE, 2017: 77−85.
[6]
QI C R, YI L, SU H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space [C]// Advances in Neural Information Processing Systems. Long Beach: MIT Press, 2017: 5099−5108.
[7]
LAWIN F J, DANELLJAN M, TOSTEBERG P, et al. Deep projective 3D semantic segmentation [C]// International Conference on Computer Analysis of Images and Patterns . Ystad: Springer, 2017: 95−107.
[8]
BOULCH A, GUERRY J, SAUX B, et al SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks[J]. Computer and Graphics, 2018, 71: 189- 198
doi: 10.1016/j.cag.2017.11.010
[9]
GUERRY J, BOULCH A, LE S, et al. SnapNet-R: consistent 3D multi-view semantic labeling for robotics [C]// Proceedings of the IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 669−678.
[10]
CORTINHAL T, TZELEPIS G, ERDAL E, et al. SalsaNext: fast, uncertainty-aware semantic segmentation of LiDAR point clouds [C]// International Symposium on Visual Computing . San Diego: Springer, 2020: 207−222.
[11]
ÇICEK O, ABDULKADIR A, LIENKAMP S S, et al. 3D U-Net: learning dense volumetric segmentation from sparse annotation [C]// Medical Image Computing and Computer-Assisted Intervention . Athens: Springer, 2016: 424−432.
[12]
WANG P S, LIU Y, GUO Y X, et al O-CNN: octree-based convolutional neural networks for 3D shape analysis[J]. ACM Transactions on Graphics, 2017, 36 (4): 1- 11
[13]
MENG H Y, GAO L, LAI Y K, et al. VV-Net: voxel VAE net with group convolutions for point cloud segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 8499−8507.
[14]
LE T, DUAN Y. PointGrid: a deep network for 3D shape understanding [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 9204−9214.
[15]
WANG Y, SUN Y, LIU Z, et al Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2018, 38 (5): 146- 158
[16]
KANG Z H, LI N PyramNet: point cloud pyramid attention network and graph embedding module for classification and segmentation[J]. Australian Journal of Intelligent Information Processing Systems, 2019, 16 (2): 35- 43
[17]
党吉圣, 杨军 多特征融合的三维模型识别与分割[J]. 西安电子科技大学学报, 2020, 47 (4): 149- 157 DANG Jisheng, YANG Jun 3D model recognition and segmentation based on multi-feature fusion[J]. Journal of Xidian University, 2020, 47 (4): 149- 157
[18]
HU Q Y, YANG B, XIE L H, et al. RandLA-Net: efficient semantic segmentation of large-scale point clouds [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 11105−11114.
[19]
LIU Z J, TANG H T, LIN Y J, et al. Point-voxel CNN for efficient 3D deep learning [C]// Advances in Neural Information Processing Systems . Vancouver: MIT Press, 2019: 963−973.
[20]
ZHANG F H, FANG J, WAH B, et al. Deep fusionnet for point cloud semantic segmentation [C]// Proceedings of the European Conference on Computer Vision . Glasgow: Springer, 2020: 644−663.
[21]
LIONG V E, NGUYEN T N T, Widjaja S, et al. AMVNet: assertion-based multi-view fusion network for LiDAR semantic segmentation [EB/OL]. (2020-12-09) [2023-02-12]. https://doi.org/10.48550/arXiv.2012.04934.
[22]
XU J Y, ZHANG R X, DOU J, et al. RPVNet: a deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 16004−16013.
[23]
RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation [C]// Medical Image Computing and Computer-Assisted Intervention . Munich: Springer, 2015: 234−241.
[24]
GRAHAM B, ENGELCKE M, MAATEN L. 3D semantic segmentation with submanifold sparse convolutional networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 9224−9232.
ARMENI I, SENER O, ZAMIR A, et al. 3D semantic parsing of large-scale indoor spaces [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 1534−1543.
[27]
BEHLEY J, GARBADE M, MILIOTO A, et al. SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 9296−9306.
[28]
TATARCHENKO M, PARK J, KOLTUN V, et al. Tangent convolutions for dense prediction in 3D [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 3887−3896.
[29]
LI Y, BU R, SUN M, et al. PointCNN: convolution on x-transformed points [C]// Advances in Neural Information Processing Systems . Montréal: MIT Press, 2018: 828−838.
[30]
LANDRIEU L, SIMONOVSKY M. Large-scale point cloud semantic segmentation with superpoint graphs [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 4558−4567.
[31]
ZHAO H, JIANG L, FU C W, et al. PointWeb: enhancing local neighborhood features for point cloud processing [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 5565−5573.
[32]
JIANG L, ZHAO H S, LIU S, et al. Hierarchical point-edge interaction network for point cloud semantic segmentation [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 10432−10440.
[33]
WANG L, HUANG Y, HOU Y, et al. Graph attention convolution for point cloud semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 10296−10305.
[34]
AHN P, YANG J, YI E, et al Projection-based point convolution for efficient point cloud segmentation[J]. IEEE Access, 2022, 10: 15348- 15358
doi: 10.1109/ACCESS.2022.3144449
[35]
SHI Q, SAEED A, NICK B. Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . [s. l.]: IEEE, 2021: 1757−1767.
[36]
THOMAS H, QI C R, DESCHAUD J E, et al. KPConv: flexible and deformable convolution for point clouds [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 6410−6419.
[37]
WEI M, WEI Z, ZHOU H, et al AGConv: adaptive graph convolution on 3D point clouds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (8): 9374- 9392
[38]
SHI H Y, LIN G S, WANG H, et al. SpSequenceNet: semantic segmentation network on 4D point clouds [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 4573–4582.
[39]
DANG J S, YANG J. HPGCNN: hierarchical parallel group convolutional neural networks for point clouds processing [C]// Proceedings of the Asian Conference on Computer Vision . Kyoto: Springer, 2020: 20−37.
[40]
MILIOTO A, VIZZO Ⅰ, BEHLEY J, et al. RangeNet++: fast and accurate LiDAR semantic segmentation [C]// IEEE/RSJ International Conference on Intelligent Robots and Systems . Macau: IEEE, 2019: 4213−4220.
[41]
ZHANG Y, ZHOU Z, DAIID P, et al. PolarNet: an improved grid representation for online LiDAR point clouds semantic segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 9598−9607.
[42]
ALONSO I, RIAZUELO L, MONTESANO L, et al 3D-MiniNet: learning a 2D representation from point clouds for fast and efficient 3D LiDAR semantic segmentation[J]. IEEE Robotics and Automation Letters, 2020, 5 (4): 5432- 5439
doi: 10.1109/LRA.2020.3007440
[43]
杨军, 李博赞 基于自注意力特征融合组卷积神经网络的三维点云语义分割[J]. 光学精密工程, 2022, 30 (7): 840- 853 YANG Jun, LI Bozan Semantic segmentation of 3D point cloud based on self-attention feature fusion group convolutional neural network[J]. Optics and Precision Engineering, 2022, 30 (7): 840- 853
doi: 10.37188/OPE.20223007.0840