|
|
Lightweight image semantic segmentation based on multi-level feature cascaded network |
Deng-wen ZHOU(),Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN |
School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China |
|
|
Abstract Semantic segmentation algorithms usually have complex network structure and huge computation. A lightweight image semantic segmentation algorithm based on multi-level feature cascaded network was proposed to improve the infer speed and accuracy of semantic segmentation. The number of parameters, running speed and performance of the proposed network were considered comprehensively, which can be better applied to embedded devices and mobile devices. The fine-turned deep convolutional neural classification network was used for feature extraction, which can extract both the semantic and location characteristics of different depth layers in the network. An atrous residual feature refine module and a deep atrous spatial pyramid pooling module were used to fuse the deep and shallow features, respectively. And then, the features from deep and shallow layers were fused in parallel with a specific proportion. The mean intersection over union of the proposed approach on the PASCAL VOC 2012 dataset was 77.13%. The proposed method has a better balance between the real-time performance and segmentation accuracy, and has good performance and practical value compared with the current state of the art semantic segmentation and real-time semantic segmentation algorithms.
|
Received: 08 July 2019
Published: 28 August 2020
|
|
基于多级特征并联的轻量级图像语义分割
针对当前语义分割算法普遍具有网络结构复杂和计算开销巨大的问题,为了综合提高语义分割算法实时性和精确度,提出计算高效的基于多级特征并联网络(LSSN)的轻量级图像语义分割网络. 该算法综合考虑网络的参数量、运行速度和性能,能更好地应用到嵌入式设备和可移动设备上. 应用微调的深度卷积神经分类网络作为特征提取网络结构,提取网络不同深浅层语义和位置特征. 提出空洞残差增强模块和深度空洞空间金字塔模块分别处理来自特征提取基准网络的深层特征和浅层特征,并将深浅层特征按特定维度比例以并联的方式进行融合. 所提方法在PASCAL VOC 2012数据集上准确度(平均交并比)为77.13%,与当前具有高性能的语义分割算法和实时语义分割算法相比,能更好地平衡网络的实时性和精确度,具有更优的实用价值和性能效果.
关键词:
深度学习,
全卷积神经网络,
语义分割,
特征融合,
空洞卷积
|
|
[1] |
LONG J, SHELHAMER E, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 39 (4): 640- 651
|
|
|
[2] |
CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// European Conference on Computer Vision. Munich: Springer, 2018: 801-818.
|
|
|
[3] |
CHEN L C, PAPANDREOU G, KOKKINOS I, et al Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. International Conference on Learning Representations, 2014(4), 357- 361
|
|
|
[4] |
LIN G, MILAN A, SHEN C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 5168-5177.
|
|
|
[5] |
ZHAO H, SHI J, QI X, et al Pyramid scene parsing network[J]. IEEE Conference on Computer Vision and Pattern Recognition, 2017(1), 2881- 2890
|
|
|
[6] |
CHOLLET F. Xception: deep learning with depthwise separable convolutions [C]// IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 1251-1258.
|
|
|
[7] |
HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications [EB/OL]. [2017-04-17]. https://arxiv.org/abs/1704.04861.
|
|
|
[8] |
PIOTR B, VICTOR P. Dense decoder shortcut connections for single-pass semantic segmentation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6596-6605.
|
|
|
[9] |
ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks [C]// European Conference on Computer Vision. Zurich: Springer, 2014: 818–833.
|
|
|
[10] |
HE K, ZHANG X, REN S, et al Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37 (9): 1904- 1916
|
|
|
[11] |
CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. [2017-06-17]. https://arxiv.org/abs/1706.05587.
|
|
|
[12] |
YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions [EB/OL]. [2015-11-23]. https://arxiv.org/abs/1511.07122.
|
|
|
[13] |
SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks. conference [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520.
|
|
|
[14] |
RUSSAKOVSKY O, DENG J, SU H, et al ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252
doi: 10.1007/s11263-015-0816-y
|
|
|
[15] |
CIRESAN D, GIUSTI A, GAMBARDELLA L M, et al. Deep neural networks segment neuronal membranes in electron microscopy images [C]// Advances in Neural Information Processing Systems. Lake Tahoe: MIT Press, 2012: 2843-2851.
|
|
|
[16] |
FARABET C, COUPRIE C, NAJMAN L, et al Learning hierarchical features for scene labeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35 (8): 1915- 1929
doi: 10.1109/TPAMI.2012.231
|
|
|
[17] |
EVERINGHAM M, ESLAMI S M A, VAN-GOOI L, et al The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111 (1): 98- 136
doi: 10.1007/s11263-014-0733-5
|
|
|
[18] |
CORDTS M, OMRAN M, RANMOS S. The cityscapes dataset for semantic urban scene understanding [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213-3223.
|
|
|
[19] |
PASZKE A, CHAURASIA A, KIM S, et al. ENet: a deep neural network architecture for real-time semantic segmentation [EB/OL]. [2016-06-07]. https://arxiv.org/abs/1606.02147.
|
|
|
[20] |
TREML M, ARJONA-MEDINA J, UNTERTHINER T, et al. Speeding up semantic segmentation for autonomous driving [C]// Neural Information Processing Systems Workshop. Barcelona: MIT Press, 2016.
|
|
|
[21] |
FORREST N L, SONG H, MATTHEW W, et al. SqueezeNet: alexnet-level accuracy with 50x fewer parameters and 1mb model size [EB/OL]. [2016-02-24]. https://arxiv.org/abs/1602.07360.
|
|
|
[22] |
SERCAN T, JANNE H. An efficient solution for semantic segmentation_ShuffleNet V2 with atrous separable convolutions [EB/OL]. [2019-02-20]. https://arxiv.org/abs/1902.07476.
|
|
|
[23] |
ZHAO H, QI X, SHEN X, et al. ICNET for real-time semantic segmentation on high-resolution images [EB/OL]. [2017-04-27]. https://arxiv.org/abs/1704.08545.
|
|
|
[24] |
BADRINARAYANAN V, KENDALL A, CIPOLLA R SegNet: a deep convolutional encoder-decoder architecture for scene segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495
doi: 10.1109/TPAMI.2017.2701373
|
|
|
[25] |
ZHENG S, JAYASUMANA S, ROMERA-PAREDES B, et al. Conditional random fields as recurrent neural networks [C]// IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1529-1537.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|