Aiming at the problem of ignoring normal attributes in road scene detection, in order to strengthen the use of spatial context and edge information, a road scene normal region segmentation method combined with spatial context algorithm was proposed, and the road scene was identified as the horizontal and the vertical regions corresponding to the road and the obstacles, respectively. On the basis of the cross-entropy loss function, the obstacle enhancement loss was added to improve the weight distribution of different categories in the training process, and improve the recognition rate of obstacles in small areas. The context improvement algorithm was proposed to optimize the matrix calculation method of the position association graph, reduce the space complexity and improve the calculation efficiency. The edge context module was embedded to reduce noise and strengthen the main edge, enhancing the use of edge information. Experimental results on the self-built dataset and the Cityscapes dataset show that compared with mainstream semantic segmentation methods, the proposed method strengthens the ability of network feature extraction, effectively improves the segmentation accuracy of the road normal region, of which the intersection is 2.1% higher than that of Deeplab, easily and effectively achieving obstacle avoidance tasks.
Xue-yun CHEN,Qu YAO,Qi-chen DING,Xue-yu BEI,Xiao-qiao HUANG,Xin JIN. Normal region segmentation of road scene based on spatial context algorithm. Journal of ZheJiang University (Engineering Science), 2021, 55(11): 2013-2021.
Fig.1Normal region segmentation network structure diagram
Fig.2Schematic diagram of context algorithm
Fig.3Structure diagram of spatial context module
Fig.4Non-local algorithm structure diagram
Fig.5Edge context module structure
Fig.6Comparison diagram of normal detection network experiment
网络
Mean
Median
RMSE/%
P1/%
P2/%
P3/%
MarrNet[8]
35.18
32.16
38.64
18.38
42.74
61.51
文献[10]方法
28.95
25.43
34.35
35.37
59.26
82.89
本研究算法
25.26
22.57
29.81
41.75
72.68
91.27
Tab.1Test results comparison of different normal detection network
Fig.7Fig.7 Comparison diagram of segmentation network experiments
网络
IoU/%
MPA/%
FCN[12]
63.43
75.17
UNet[14]
71.25
80.36
PSPnet[15]
70.82
80.04
Deeplab[17]
71.66
82.59
本研究算法
73.76
85.33
Tab.2Test results of different segmentation networks
网络
IoU/%
MPA/%
Backbone
70.58
82.17
Backbone+SCM
72.74
84.66
Backbone+Non-local
72.69
84.40
Backbone+SCM+ECM
73.31
85.57
Backbone+SCM+ECM+Lo
73.76
85.33
Tab.3Test results of improvement effect of each module
[1]
HOIEM D, EFROS A, HEBERT M Recovering surface layout from an image[J]. International Journal of Computer Vision, 2007, 75 (1): 151- 172
doi: 10.1007/s11263-006-0031-y
[2]
FOUHEY D, GUPTA A, HEBERT M. Data-driven 3D primitives for single image understanding[C]// International Conference on Computer Vision. Sydney: IEEE, 2013: 3392-3399.
[3]
LADICK L, ZEISL B, POLLEFEYS M. Discriminatively trained dense surface normal estimation[C]// European Conference on Computer Vision. Cham: Springer, 2014: 468-484.
[4]
KUSUPAT U, CHENG S, CHEN R, et al. Normal assisted stereo depth estimation[C]// Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 2189-2199.
[5]
ZHANG S, XIE W, ZHANG G, et al. Robust stereo matching with surface normal prediction[C]// International Conference on Robotics and Automation. Singapore: IEEE, 2017: 2540-2547.
[6]
EIGEN D, FERGUS R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture[C]// International Conference on Computer Vision. Santiago: IEEE, 2015: 2650-2658.
[7]
KRIZHEVSKY A , SUTSKEVER I , HINTON G . Classification with deep convolutional neural networks[J]. Communications of the ACM. 2017, 60(6): 84-90.
[8]
BANSAL A, RUSSELL B, GUPTA A. Marr revisited: 2D-3D alignment via surface normal prediction[C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 5965-5974.
[9]
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [C]// International Conference on Learning Representations. San Diego: ICRL, 2014.
[10]
冼楚华, 刘欣, 李桂清, 等 基于多尺度卷积网络的单幅图像的点法向估计[J]. 华南理工大学学报:自然科学版, 2018, 46 (12): 7- 15 XIAN Chu-hua, LIU Xin, LI Gui-qing, et al Normal estimation from single monocular images based on multi-scale convolution network[J]. Journal of South China University of Technology: Natural Science Edition, 2018, 46 (12): 7- 15
[11]
HAN Y, ZHANG S, ZHANG Y, et al Monocular depth estimation with guidance of surface normal map[J]. Neurocomputing, 2018, 280: 86- 100
doi: 10.1016/j.neucom.2017.08.074
[12]
SHEIHAMER E, LONG J, DARRELL T Fully convolutional networks for semantic segmentation[J]. Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (4): 640- 651
doi: 10.1109/TPAMI.2016.2572683
[13]
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[14]
RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich: Springer, 2015: 234-241.
[15]
ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]// Conference on Computer Vision And Pattern Recognition. Honolulu: IEEE, 2017: 6230-6239.
[16]
CHEN L, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 834- 848
doi: 10.1109/TPAMI.2017.2699184
[17]
WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7794-7803.
[18]
FU J , LIU J , TIAN H , et al. Dual attention network for scene segmentation[C]// Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3141-3149.
[19]
CAO Y, XU J , LIN S, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond[C]// International Conference on Computer Vision Workshops. Seoul: IEEE, 2019: 1971-1980.