Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2020, Vol. 54 Issue (8): 1516-1524    DOI: 10.3785/j.issn.1008-973X.2020.08.009
    
Lightweight image semantic segmentation based on multi-level feature cascaded network
Deng-wen ZHOU(),Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN
School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
Download: HTML     PDF(1013KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Semantic segmentation algorithms usually have complex network structure and huge computation. A lightweight image semantic segmentation algorithm based on multi-level feature cascaded network was proposed to improve the infer speed and accuracy of semantic segmentation. The number of parameters, running speed and performance of the proposed network were considered comprehensively, which can be better applied to embedded devices and mobile devices. The fine-turned deep convolutional neural classification network was used for feature extraction, which can extract both the semantic and location characteristics of different depth layers in the network. An atrous residual feature refine module and a deep atrous spatial pyramid pooling module were used to fuse the deep and shallow features, respectively. And then, the features from deep and shallow layers were fused in parallel with a specific proportion. The mean intersection over union of the proposed approach on the PASCAL VOC 2012 dataset was 77.13%. The proposed method has a better balance between the real-time performance and segmentation accuracy, and has good performance and practical value compared with the current state of the art semantic segmentation and real-time semantic segmentation algorithms.



Key wordsdeep learning      full convolutional neural network      semantic segmentation      feature fusion      atrous convolution     
Received: 08 July 2019      Published: 28 August 2020
CLC:  TP 391  
Cite this article:

Deng-wen ZHOU,Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN. Lightweight image semantic segmentation based on multi-level feature cascaded network. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1516-1524.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2020.08.009     OR     http://www.zjujournals.com/eng/Y2020/V54/I8/1516


基于多级特征并联的轻量级图像语义分割

针对当前语义分割算法普遍具有网络结构复杂和计算开销巨大的问题,为了综合提高语义分割算法实时性和精确度,提出计算高效的基于多级特征并联网络(LSSN)的轻量级图像语义分割网络. 该算法综合考虑网络的参数量、运行速度和性能,能更好地应用到嵌入式设备和可移动设备上. 应用微调的深度卷积神经分类网络作为特征提取网络结构,提取网络不同深浅层语义和位置特征. 提出空洞残差增强模块和深度空洞空间金字塔模块分别处理来自特征提取基准网络的深层特征和浅层特征,并将深浅层特征按特定维度比例以并联的方式进行融合. 所提方法在PASCAL VOC 2012数据集上准确度(平均交并比)为77.13%,与当前具有高性能的语义分割算法和实时语义分割算法相比,能更好地平衡网络的实时性和精确度,具有更优的实用价值和性能效果.


关键词: 深度学习,  全卷积神经网络,  语义分割,  特征融合,  空洞卷积 
Fig.1 Vision comparison of valid receptive field of standard and atrous convolution
Fig.2 Network structure of proposed method
输入 操作 T C N S
$ {513}^{2}\times 3 $ Conv2d ? 32 1 2
$ {257}^{2}\times 32 $ bottleneck 1 16 1 2
$ {129}^{2}\times 16 $ bottleneck 6 24 2 2
$ {65}^{2}\times 24 $ bottleneck 6 32 3 2
$ {33}^{2}\times 32 $ bottleneck 6 64 4 1
$ {33}^{2}\times 64 $ bottleneck 6 96 3 1
$ {33}^{2}\times 96 $ bottleneck 6 160 3 1
$ {33}^{2}\times 160 $ bottleneck 6 320 1 1
Tab.1 Baseli nenetwork structure
方法 基准网络 T0 / ms mIoU / %
ENet[19] ENet 261 58.3
SQ[20] SqueezeNet[21] 781 59.8
ShuffleNetV2[22] ShuffleNetv2 45 67.7
ICNet[23] PSPNet50[5] 176 70.2
本研究算法 mobileNetv2 18 70.6
Tab.2 Performance and speed comparison with different baseline networks on Cityscapes validation set
Fig.3 Bottleneck structure in network
模型 mIoU / % Nf / (帧·s?1)
A 75.45 12.20
A+AR 76.06 11.76
A+DASPP 76.20 11.90
A+AR+DASPP 77.13 11.49
Tab.3 Performance comparison of proposed model and model A on PASCAL VOC 2012 validation set
AR r mIoU / %
24 76.26
18 77.13
12 76.79
6 76.45
× ? 76.20
Tab.4 Performance comparison of different AR settings on PASCAL VOC 2012 validation set
模型 P / 106 mIoU / %
DASPP(conv $ \times $3) 6.72 77.03
DASPP(conv $ \times $2) 6.52 77.13
DASPP(conv $ \times $1) 6.32 76.69
ASPP 6.13 76.49
Tab.5 Performance comparison of different DASPP settings on PASCAL VOC 2012 validation set
特征融合 mIoU / %
$ {{D}} $ 75.80
$ {{L}}8\cup {{D}} $ 75.85
$ {{L}}12\cup {{D}} $ 75.93
$ {{L}}15\cup {{D}} $ 76.34
Tab.6 Performance comparison of adding different shallow layers on PASCAL VOC 2012 validation set
特征融合 mIoU / %
D 75.80
$ \left({{L}}12\cup \;{{L}}15\right)+{{D}} $ 76.55
$ {{L}}12\cup \;{{L}}15\cup {{D}} $(1∶1∶1) 76.70
$ {{L}}12\cup \;{{L}}15\cup {{D}} $ 77.13
Tab.7 Performance comparison of different layers cascade ways on PASCAL VOC 2012 validation set
方法 P / 106 mIoU / %
FCN-8s[1] 134.50 67.20
DeepLab[3] 44.04 71.60
DeepLabv3+[2] 44.61 87.80
本研究算法 6.52 77.13
Tab.8 Comparison of proposed model and other high-performance semantic segmentation methods on PASCAL VOC 2012 test set
方法 T0 / ms Nf / (帧·s?1) mIoU / %
SegNet[24] 60 16.70 57.0
CRF-RCNN[25] 700 1.43 62.5
DeepLab[3] 400 2.50 63.1
FCN-8s[1] 500 2.00 65.3
Dilation[12] 4000 0.25 67.1
DeepLabv3+[2] 350 2.86 82.1
本研究算法 18 55.60 70.6
Tab.9 Comparison of network performance of proposed model and other high-performance semantic segmentation methods on Cityscapes test set
方法 T0 / ms Nf / (帧·s?1) mIoU / %
ENet[19] 13 76.9 58.3
ERFNet[26] 89 11.2 69.7
本研究算法 18 55.6 70.6
Tab.10 Comparison of proposed model and other real-time semantic segmentation methods on Cityscapes test set
Fig.4 Visual result comparison of proposed methods with others on Cityscapes test set
Fig.5 Visual result comparison of proposed methods with others on PASCAL VOC 2012 test set
[1]   LONG J, SHELHAMER E, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 39 (4): 640- 651
[2]   CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// European Conference on Computer Vision. Munich: Springer, 2018: 801-818.
[3]   CHEN L C, PAPANDREOU G, KOKKINOS I, et al Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. International Conference on Learning Representations, 2014(4), 357- 361
[4]   LIN G, MILAN A, SHEN C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 5168-5177.
[5]   ZHAO H, SHI J, QI X, et al Pyramid scene parsing network[J]. IEEE Conference on Computer Vision and Pattern Recognition, 2017(1), 2881- 2890
[6]   CHOLLET F. Xception: deep learning with depthwise separable convolutions [C]// IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 1251-1258.
[7]   HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications [EB/OL]. [2017-04-17]. https://arxiv.org/abs/1704.04861.
[8]   PIOTR B, VICTOR P. Dense decoder shortcut connections for single-pass semantic segmentation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6596-6605.
[9]   ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks [C]// European Conference on Computer Vision. Zurich: Springer, 2014: 818–833.
[10]   HE K, ZHANG X, REN S, et al Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37 (9): 1904- 1916
[11]   CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. [2017-06-17]. https://arxiv.org/abs/1706.05587.
[12]   YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions [EB/OL]. [2015-11-23]. https://arxiv.org/abs/1511.07122.
[13]   SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks. conference [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520.
[14]   RUSSAKOVSKY O, DENG J, SU H, et al ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252
doi: 10.1007/s11263-015-0816-y
[15]   CIRESAN D, GIUSTI A, GAMBARDELLA L M, et al. Deep neural networks segment neuronal membranes in electron microscopy images [C]// Advances in Neural Information Processing Systems. Lake Tahoe: MIT Press, 2012: 2843-2851.
[16]   FARABET C, COUPRIE C, NAJMAN L, et al Learning hierarchical features for scene labeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35 (8): 1915- 1929
doi: 10.1109/TPAMI.2012.231
[17]   EVERINGHAM M, ESLAMI S M A, VAN-GOOI L, et al The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111 (1): 98- 136
doi: 10.1007/s11263-014-0733-5
[18]   CORDTS M, OMRAN M, RANMOS S. The cityscapes dataset for semantic urban scene understanding [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213-3223.
[19]   PASZKE A, CHAURASIA A, KIM S, et al. ENet: a deep neural network architecture for real-time semantic segmentation [EB/OL]. [2016-06-07]. https://arxiv.org/abs/1606.02147.
[20]   TREML M, ARJONA-MEDINA J, UNTERTHINER T, et al. Speeding up semantic segmentation for autonomous driving [C]// Neural Information Processing Systems Workshop. Barcelona: MIT Press, 2016.
[21]   FORREST N L, SONG H, MATTHEW W, et al. SqueezeNet: alexnet-level accuracy with 50x fewer parameters and 1mb model size [EB/OL]. [2016-02-24]. https://arxiv.org/abs/1602.07360.
[22]   SERCAN T, JANNE H. An efficient solution for semantic segmentation_ShuffleNet V2 with atrous separable convolutions [EB/OL]. [2019-02-20]. https://arxiv.org/abs/1902.07476.
[23]   ZHAO H, QI X, SHEN X, et al. ICNET for real-time semantic segmentation on high-resolution images [EB/OL]. [2017-04-27]. https://arxiv.org/abs/1704.08545.
[24]   BADRINARAYANAN V, KENDALL A, CIPOLLA R SegNet: a deep convolutional encoder-decoder architecture for scene segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495
doi: 10.1109/TPAMI.2017.2701373
[25]   ZHENG S, JAYASUMANA S, ROMERA-PAREDES B, et al. Conditional random fields as recurrent neural networks [C]// IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1529-1537.
[1] Jia-hui XU,Jing-chang WANG,Ling CHEN,Yong WU. Surface water quality prediction model based on graph neural network[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(4): 601-607.
[2] Hong-li WANG,Bin GUO,Si-cong LIU,Jia-qi LIU,Yun-gang WU,Zhi-wen YU. End context-adaptative deep sensing model with edge-end collaboration[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(4): 626-638.
[3] Teng ZHANG,Xin-long JIANG,Yi-qiang CHEN,Qian CHEN,Tao-mian MI,Piu CHAN. Wrist attitude-based Parkinson's disease ON/OFF state assessment after medication[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(4): 639-647.
[4] Li-feng XU,Hai-fan HUANG,Wei-long DING,Yu-lei FAN. Detection of small fruit target based on improved DenseNet[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 377-385.
[5] Hao-can XU,Ji-tuo LI,Guo-dong LU. Reconstruction of three-dimensional human bodies from single image by LeNet-5[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(1): 153-161.
[6] Yi-peng HUANG,Ji-su HU,Xu-sheng QIAN,Zhi-yong ZHOU,Wen-lu ZHAO,Qi MA,Jun-kang SHEN,Ya-kang DAI. SE-Mask-RCNN: segmentation method for prostate cancer on multi-parametric MRI[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(1): 203-212.
[7] Qiao-hong CHEN,YI CHEN,Wen-shu Li,Yu-bo JIA. Clothing image classification based on multi-scale SE-Xception[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1727-1735.
[8] Pu ZHENG,Hong-yang BAI,Wei LI,Hong-wei GUO. Small target detection algorithm in complex background[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1777-1784.
[9] Tao MING,Dan WANG,Ji-chang GUO,Qiang LI. Breast cancer histopathological image classification using multi-scale channel squeeze-and-excitation model[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1289-1297.
[10] Xu YAN,Xiao-liang FAN,Chuan-pan ZHENG,Yu ZANG,Cheng WANG,Ming CHENG,Long-biao CHEN. Urban traffic flow prediction algorithm based on graph convolutional neural networks[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1147-1155.
[11] Zhou-fei WANG,Wei-na YUAN. Channel estimation and detection method for multicarrier system based on deep learning[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(4): 732-738.
[12] Bing YANG,Wen-bo MO,Jin-liang YAO. 3D palmprint recognition by using local features and deep learning[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 540-545.
[13] Yan-jia HONG,Tie-bao MENG,Hao-jiang LI,Li-zhi LIU,Li LI,Shuo-yu XU,Sheng-wen GUO. Deep segmentation method of tumor boundaries from MR images of patients with nasopharyngeal carcinoma using multi-modality and multi-dimension fusion[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 566-573.
[14] Zi-yu JIA,You-fang LIN,Hong-jun ZHANG,Jing WANG. Sleep stage classification model based ondeep convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(10): 1899-1905.
[15] Wan-liang WANG,Xiao-han YANG,Yan-wei ZHAO,Nan GAO,Chuang LV,Zhao-juan ZHANG. Image enhancement algorithm with convolutional auto-encoder network[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(9): 1728-1740.