Please wait a minute...
浙江大学学报(工学版)  2020, Vol. 54 Issue (8): 1516-1524    DOI: 10.3785/j.issn.1008-973X.2020.08.009
计算机技术     
基于多级特征并联的轻量级图像语义分割
周登文(),田金月,马路遥,孙秀秀
华北电力大学 控制与计算机工程学院,北京 102206
Lightweight image semantic segmentation based on multi-level feature cascaded network
Deng-wen ZHOU(),Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN
School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China
 全文: PDF(1013 KB)   HTML
摘要:

针对当前语义分割算法普遍具有网络结构复杂和计算开销巨大的问题,为了综合提高语义分割算法实时性和精确度,提出计算高效的基于多级特征并联网络(LSSN)的轻量级图像语义分割网络. 该算法综合考虑网络的参数量、运行速度和性能,能更好地应用到嵌入式设备和可移动设备上. 应用微调的深度卷积神经分类网络作为特征提取网络结构,提取网络不同深浅层语义和位置特征. 提出空洞残差增强模块和深度空洞空间金字塔模块分别处理来自特征提取基准网络的深层特征和浅层特征,并将深浅层特征按特定维度比例以并联的方式进行融合. 所提方法在PASCAL VOC 2012数据集上准确度(平均交并比)为77.13%,与当前具有高性能的语义分割算法和实时语义分割算法相比,能更好地平衡网络的实时性和精确度,具有更优的实用价值和性能效果.

关键词: 深度学习全卷积神经网络语义分割特征融合空洞卷积    
Abstract:

Semantic segmentation algorithms usually have complex network structure and huge computation. A lightweight image semantic segmentation algorithm based on multi-level feature cascaded network was proposed to improve the infer speed and accuracy of semantic segmentation. The number of parameters, running speed and performance of the proposed network were considered comprehensively, which can be better applied to embedded devices and mobile devices. The fine-turned deep convolutional neural classification network was used for feature extraction, which can extract both the semantic and location characteristics of different depth layers in the network. An atrous residual feature refine module and a deep atrous spatial pyramid pooling module were used to fuse the deep and shallow features, respectively. And then, the features from deep and shallow layers were fused in parallel with a specific proportion. The mean intersection over union of the proposed approach on the PASCAL VOC 2012 dataset was 77.13%. The proposed method has a better balance between the real-time performance and segmentation accuracy, and has good performance and practical value compared with the current state of the art semantic segmentation and real-time semantic segmentation algorithms.

Key words: deep learning    full convolutional neural network    semantic segmentation    feature fusion    atrous convolution
收稿日期: 2019-07-08 出版日期: 2020-08-28
CLC:  TP 391  
基金资助: 中央高校基本科研业务费专项资金资助项目(2018ZD06)
作者简介: 周登文(1965—),男,教授,从事基于深度学习的图像处理和计算机视觉研究. orcid.org/0000-0001-9612-0215. E-mail: zdw@ncepu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
周登文
田金月
马路遥
孙秀秀

引用本文:

周登文,田金月,马路遥,孙秀秀. 基于多级特征并联的轻量级图像语义分割[J]. 浙江大学学报(工学版), 2020, 54(8): 1516-1524.

Deng-wen ZHOU,Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN. Lightweight image semantic segmentation based on multi-level feature cascaded network. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1516-1524.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2020.08.009        http://www.zjujournals.com/eng/CN/Y2020/V54/I8/1516

图 1  标准卷积与空洞卷积有效感受野可视化对比
图 2  本研究所提方法的网络结构图
输入 操作 T C N S
$ {513}^{2}\times 3 $ Conv2d ? 32 1 2
$ {257}^{2}\times 32 $ bottleneck 1 16 1 2
$ {129}^{2}\times 16 $ bottleneck 6 24 2 2
$ {65}^{2}\times 24 $ bottleneck 6 32 3 2
$ {33}^{2}\times 32 $ bottleneck 6 64 4 1
$ {33}^{2}\times 64 $ bottleneck 6 96 3 1
$ {33}^{2}\times 96 $ bottleneck 6 160 3 1
$ {33}^{2}\times 160 $ bottleneck 6 320 1 1
表 1  基准网络结构
方法 基准网络 T0 / ms mIoU / %
ENet[19] ENet 261 58.3
SQ[20] SqueezeNet[21] 781 59.8
ShuffleNetV2[22] ShuffleNetv2 45 67.7
ICNet[23] PSPNet50[5] 176 70.2
本研究算法 mobileNetv2 18 70.6
表 2  Cityscapes验证集上不同基准网络的性能和运行速度对比
图 3  网络中的bottleneck结构图
模型 mIoU / % Nf / (帧·s?1)
A 75.45 12.20
A+AR 76.06 11.76
A+DASPP 76.20 11.90
A+AR+DASPP 77.13 11.49
表 3  PASCAL VOC 2012 验证集上本研究网络结构和模型A的性能对比
AR r mIoU / %
24 76.26
18 77.13
12 76.79
6 76.45
× ? 76.20
表 4  PASCAL VOC 2012验证集上不同AR设置的性能对比
模型 P / 106 mIoU / %
DASPP(conv $ \times $3) 6.72 77.03
DASPP(conv $ \times $2) 6.52 77.13
DASPP(conv $ \times $1) 6.32 76.69
ASPP 6.13 76.49
表 5  PASCAL VOC 2012验证集上不同DASPP设置的性能对比
特征融合 mIoU / %
$ {{D}} $ 75.80
$ {{L}}8\cup {{D}} $ 75.85
$ {{L}}12\cup {{D}} $ 75.93
$ {{L}}15\cup {{D}} $ 76.34
表 6  PASCAL VOC 2012验证集上加入不同浅层特征的性能效果实验对比
特征融合 mIoU / %
D 75.80
$ \left({{L}}12\cup \;{{L}}15\right)+{{D}} $ 76.55
$ {{L}}12\cup \;{{L}}15\cup {{D}} $(1∶1∶1) 76.70
$ {{L}}12\cup \;{{L}}15\cup {{D}} $ 77.13
表 7  PASCAL VOC 2012验证集上深浅层特征不同级联方式的性能效果实验对比
方法 P / 106 mIoU / %
FCN-8s[1] 134.50 67.20
DeepLab[3] 44.04 71.60
DeepLabv3+[2] 44.61 87.80
本研究算法 6.52 77.13
表 8  PASCAL VOC 2012测试集上所提算法与高性能语义分割的性能对比
方法 T0 / ms Nf / (帧·s?1) mIoU / %
SegNet[24] 60 16.70 57.0
CRF-RCNN[25] 700 1.43 62.5
DeepLab[3] 400 2.50 63.1
FCN-8s[1] 500 2.00 65.3
Dilation[12] 4000 0.25 67.1
DeepLabv3+[2] 350 2.86 82.1
本研究算法 18 55.60 70.6
表 9  Cityscapes测试集上所提算法与高性能语义分割网络性能对比
方法 T0 / ms Nf / (帧·s?1) mIoU / %
ENet[19] 13 76.9 58.3
ERFNet[26] 89 11.2 69.7
本研究算法 18 55.6 70.6
表 10  Cityscapes测试集上所提算法与实时语义分割网络的性能对比
图 4  本研究方法和其他方法在Cityscapes测试集上的可视化结果对比
图 5  本研究方法和其他方法在PASCAl VOC 2012测试集上的可视化结果对比
1 LONG J, SHELHAMER E, DARRELL T Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 39 (4): 640- 651
2 CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// European Conference on Computer Vision. Munich: Springer, 2018: 801-818.
3 CHEN L C, PAPANDREOU G, KOKKINOS I, et al Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. International Conference on Learning Representations, 2014(4), 357- 361
4 LIN G, MILAN A, SHEN C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 5168-5177.
5 ZHAO H, SHI J, QI X, et al Pyramid scene parsing network[J]. IEEE Conference on Computer Vision and Pattern Recognition, 2017(1), 2881- 2890
6 CHOLLET F. Xception: deep learning with depthwise separable convolutions [C]// IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 1251-1258.
7 HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications [EB/OL]. [2017-04-17]. https://arxiv.org/abs/1704.04861.
8 PIOTR B, VICTOR P. Dense decoder shortcut connections for single-pass semantic segmentation [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6596-6605.
9 ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks [C]// European Conference on Computer Vision. Zurich: Springer, 2014: 818–833.
10 HE K, ZHANG X, REN S, et al Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37 (9): 1904- 1916
11 CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. [2017-06-17]. https://arxiv.org/abs/1706.05587.
12 YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions [EB/OL]. [2015-11-23]. https://arxiv.org/abs/1511.07122.
13 SANDLER M, HOWARD A, ZHU M, et al. MobileNetV2: inverted residuals and linear bottlenecks. conference [C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520.
14 RUSSAKOVSKY O, DENG J, SU H, et al ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252
doi: 10.1007/s11263-015-0816-y
15 CIRESAN D, GIUSTI A, GAMBARDELLA L M, et al. Deep neural networks segment neuronal membranes in electron microscopy images [C]// Advances in Neural Information Processing Systems. Lake Tahoe: MIT Press, 2012: 2843-2851.
16 FARABET C, COUPRIE C, NAJMAN L, et al Learning hierarchical features for scene labeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35 (8): 1915- 1929
doi: 10.1109/TPAMI.2012.231
17 EVERINGHAM M, ESLAMI S M A, VAN-GOOI L, et al The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2015, 111 (1): 98- 136
doi: 10.1007/s11263-014-0733-5
18 CORDTS M, OMRAN M, RANMOS S. The cityscapes dataset for semantic urban scene understanding [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3213-3223.
19 PASZKE A, CHAURASIA A, KIM S, et al. ENet: a deep neural network architecture for real-time semantic segmentation [EB/OL]. [2016-06-07]. https://arxiv.org/abs/1606.02147.
20 TREML M, ARJONA-MEDINA J, UNTERTHINER T, et al. Speeding up semantic segmentation for autonomous driving [C]// Neural Information Processing Systems Workshop. Barcelona: MIT Press, 2016.
21 FORREST N L, SONG H, MATTHEW W, et al. SqueezeNet: alexnet-level accuracy with 50x fewer parameters and 1mb model size [EB/OL]. [2016-02-24]. https://arxiv.org/abs/1602.07360.
22 SERCAN T, JANNE H. An efficient solution for semantic segmentation_ShuffleNet V2 with atrous separable convolutions [EB/OL]. [2019-02-20]. https://arxiv.org/abs/1902.07476.
23 ZHAO H, QI X, SHEN X, et al. ICNET for real-time semantic segmentation on high-resolution images [EB/OL]. [2017-04-27]. https://arxiv.org/abs/1704.08545.
24 BADRINARAYANAN V, KENDALL A, CIPOLLA R SegNet: a deep convolutional encoder-decoder architecture for scene segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495
doi: 10.1109/TPAMI.2017.2701373
25 ZHENG S, JAYASUMANA S, ROMERA-PAREDES B, et al. Conditional random fields as recurrent neural networks [C]// IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1529-1537.
[1] 许佳辉,王敬昌,陈岭,吴勇. 基于图神经网络的地表水水质预测模型[J]. 浙江大学学报(工学版), 2021, 55(4): 601-607.
[2] 王虹力,郭斌,刘思聪,刘佳琪,仵允港,於志文. 边端融合的终端情境自适应深度感知模型[J]. 浙江大学学报(工学版), 2021, 55(4): 626-638.
[3] 张腾,蒋鑫龙,陈益强,陈前,米涛免,陈彪. 基于腕部姿态的帕金森病用药后开-关期检测[J]. 浙江大学学报(工学版), 2021, 55(4): 639-647.
[4] 徐利锋,黄海帆,丁维龙,范玉雷. 基于改进DenseNet的水果小目标检测[J]. 浙江大学学报(工学版), 2021, 55(2): 377-385.
[5] 许豪灿,李基拓,陆国栋. 由LeNet-5从单张着装图像重建三维人体[J]. 浙江大学学报(工学版), 2021, 55(1): 153-161.
[6] 黄毅鹏,胡冀苏,钱旭升,周志勇,赵文露,马麒,沈钧康,戴亚康. SE-Mask-RCNN:多参数MRI前列腺癌分割方法[J]. 浙江大学学报(工学版), 2021, 55(1): 203-212.
[7] 陈巧红,陈翊,李文书,贾宇波. 多尺度SE-Xception服装图像分类[J]. 浙江大学学报(工学版), 2020, 54(9): 1727-1735.
[8] 郑浦,白宏阳,李伟,郭宏伟. 复杂背景下的小目标检测算法[J]. 浙江大学学报(工学版), 2020, 54(9): 1777-1784.
[9] 明涛,王丹,郭继昌,李锵. 基于多尺度通道重校准的乳腺癌病理图像分类[J]. 浙江大学学报(工学版), 2020, 54(7): 1289-1297.
[10] 闫旭,范晓亮,郑传潘,臧彧,王程,程明,陈龙彪. 基于图卷积神经网络的城市交通态势预测算法[J]. 浙江大学学报(工学版), 2020, 54(6): 1147-1155.
[11] 汪周飞,袁伟娜. 基于深度学习的多载波系统信道估计与检测[J]. 浙江大学学报(工学版), 2020, 54(4): 732-738.
[12] 杨冰,莫文博,姚金良. 融合局部特征与深度学习的三维掌纹识别[J]. 浙江大学学报(工学版), 2020, 54(3): 540-545.
[13] 洪炎佳,孟铁豹,黎浩江,刘立志,李立,徐硕瑀,郭圣文. 多模态多维信息融合的鼻咽癌MR图像肿瘤深度分割方法[J]. 浙江大学学报(工学版), 2020, 54(3): 566-573.
[14] 贾子钰,林友芳,张宏钧,王晶. 基于深度卷积神经网络的睡眠分期模型[J]. 浙江大学学报(工学版), 2020, 54(10): 1899-1905.
[15] 王万良,杨小涵,赵燕伟,高楠,吕闯,张兆娟. 采用卷积自编码器网络的图像增强算法[J]. 浙江大学学报(工学版), 2019, 53(9): 1728-1740.