Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (7): 1278-1286    DOI: 10.3785/j.issn.1008-973X.2023.07.002
自动化技术     
面向水下场景的轻量级图像语义分割网络
郭浩然(),郭继昌*(),汪昱东
天津大学 电气自动化与信息工程学院,天津 300072
Lightweight semantic segmentation network for underwater image
Hao-ran GUO(),Ji-chang GUO*(),Yu-dong WANG
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
 全文: PDF(2385 KB)   HTML
摘要:

提出面向水下场景的图像语义分割网络,考虑到速度和准确度之间的权衡问题,网络采用轻量且高效的编解码器结构. 在编码器部分,设计倒置瓶颈层和金字塔池化模块,高效地提取特征. 在解码器部分,构建特征融合模块融合多水平特征,提升了分割的准确度. 针对水下图像边缘模糊的问题,使用辅助的边缘损失函数来更好地训练网络,通过语义边界的监督细化分割的边缘. 在水下语义分割数据集SUIM上的实验数据表明,对于320像素×256像素的输入图像,该网络在NVIDIA GeForce GTX 1080Ti显卡上的推理速度达到258.94帧/s,mIoU达到53.55%,能够在保证高准确度的同时,达到实时的处理速度.

关键词: 图像处理水下图像语义分割边缘特征轻量级网络    
Abstract:

A semantic segmentation network was designed for underwater images. A lightweight and efficient encoder-decoder architecture was used by considering the trade-off between speed and accuracy. Inverted bottleneck layer and pyramid pooling module were designed in the encoder part to efficiently extract features. Feature fusion module was constructed in the decoder part in order to fuse multi-level features, which improved the segmentation accuracy. Auxiliary edge loss function was used to train the network better aiming at the problem of fuzzy edges of underwater images, and the edges of segmentation were refined through the supervision of semantic boundaries. The experimental data on the underwater semantic segmentation dataset SUIM show that the network achieves 53.55% mean IoU with an inference speed of 258.94 frames per second on one NVIDIA GeForce GTX 1080 Ti card for the input image of pixel 320×256, which can achieve real-time processing speed while maintaining high accuracy.

Key words: image processing    underwater image    semantic segmentation    edge feature    lightweight network
收稿日期: 2022-07-27 出版日期: 2023-07-17
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(62171315)
通讯作者: 郭继昌     E-mail: 2568971284@qq.com;jcguo@tju.edu.cn
作者简介: 郭浩然(1997—),男,硕士生,从事语义分割的研究. orcid.org/0000-0001-7959-4021. E-mail: 2568971284@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
郭浩然
郭继昌
汪昱东

引用本文:

郭浩然,郭继昌,汪昱东. 面向水下场景的轻量级图像语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(7): 1278-1286.

Hao-ran GUO,Ji-chang GUO,Yu-dong WANG. Lightweight semantic segmentation network for underwater image. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1278-1286.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.07.002        https://www.zjujournals.com/eng/CN/Y2023/V57/I7/1278

图 1  面向水下场景的轻量级图像语义分割网络总体结构
模块 模块类型 输出尺寸(W×H×C)
模块1-1
模块1-2
3×3卷积(s = 2)
3×3卷积(s = 2)
160×128×12
80×64×12
模块2-1
模块2-2
倒置瓶颈层(r = 3,s = 2)
倒置瓶颈层(r = 6,s = 1)
40×32×24
40×32×24
模块3-1
模块3-2
倒置瓶颈层(r = 3,s = 2)
倒置瓶颈层(r = 6,s = 1)
20×16×48
20×16×48
模块4-1
模块4-2
模块4-3
模块4-4
倒置瓶颈层(r = 3,s = 2)
倒置瓶颈层(r = 6,s = 1)
倒置瓶颈层(r = 12,s = 1)
倒置瓶颈层(r = 18,s = 1)
10×8×96
10×8×96
10×8×96
10×8×96
模块5 池化金字塔模块 10×8×48
表 1  所提网络的编码器组成
图 2  倒置瓶颈层与池化金字塔的结构示意图
模块 模块类型 输出尺寸(W×H×C)
第1阶段
第2阶段
特征融合模块
特征融合模块
20×16×48
40×32×24
第3阶段 上采样(8倍) 320×256×N
表 2  所提网络的解码器组成
图 3  特征融合模块的结构示意图
图 4  真实语义标签和分割结果的边缘特征提取
图 5  所提网络与经典网络的SUIM数据集的实验结果对比
图 6  所提网络与经典网络的海草数据集实验结果对比
图 7  所提网络的SUIM数据集实验失败案例
语义分割模型 IoU/% mIoU/% PA/%
BW HD PF WR RO RI FV SR
本文方法 84.62 63.99 18.46 41.84 61.93 53.44 46.00 58.42 53.55 85.32
U-Net[3] 79.46 32.25 21.85 33.94 23.65 50.28 38.16 42.16 39.85 79.44
SegNet[2] 80.63 45.67 17.45 32.24 55.72 47.62 43.92 51.51 46.85 82.19
Deeplab[4] 81.82 50.26 17.05 43.33 63.60 57.18 43.59 55.35 51.52 84.27
PSPNet[7] 82.51 65.04 28.54 46.56 62.88 55.80 46.78 55.98 55.51 86.41
GCN[24] 79.32 38.57 15.09 30.38 54.25 49.94 36.09 52.02 44.46 81.28
OCNet[15] 83.14 64.03 24.31 43.11 61.78 54.92 47.41 54.97 54.30 85.89
SUIMNet[13] 80.64 63.45 23.27 41.25 60.89 53.12 46.02 57.12 53.22 85.22
LEDNet[19] 82.96 58.47 18.02 42.86 50.96 58.13 46.13 54.99 51.36 84.25
BiseNetv2[21] 83.67 59.29 18.27 39.58 56.54 58.16 47.33 56.93 52.47 84.96
ENet[14] 80.94 50.60 16.97 36.71 51.73 49.24 41.99 50.46 47.33 82.31
ERFNet[16] 83.02 52.95 17.50 41.72 49.80 53.70 45.98 54.30 50.40 83.75
CGNet[17] 81.21 60.04 17.71 42.91 53.62 57.62 46.46 53.71 51.66 83.99
表 3  各网络在SUIM数据集上的精度指标对比结果
语义分割模型 mIoU/% PA/%
0~2 m 2~6 m 0~2 m 2~6 m
本文方法 88.63 89.01 96.08 96.10
U-Net[3] 87.69 87.42 95.89 95.62
SegNet[2] 83.90 82.93 94.96 94.92
Deeplab[4] 87.36 87.93 95.84 95.88
PSPNet[7] 89.08 89.29 96.31 96.33
GCN[24] 87.37 86.97 95.82 95.73
OCNet[15] 88.96 89.41 96.26 96.35
SUIMNet[13] 88.24 88.45 95.91 95.93
LEDNet[29] 87.48 87.84 95.85 95.88
BiseNetv2[21] 88.43 88.85 96.03 96.09
ENet[14] 85.94 86.60 95.17 95.21
ERFNet[16] 86.72 87.05 95.36 95.48
CGNet[27] 87.15 87.24 95.43 95.46
表 4  各网络在海草数据集上的精度指标对比结果
语义分割模型 v/(帧·s?1) p/106 f/109
本文方法 258.94 1.45 0.31
U-Net[3] 19.98 14.39 38.79
SegNet[2] 17.52 28.44 61.39
Deeplab[4] 16.00 5.81 8.28
PSPNet[7] 6.65 27.50 49.78
GCN[24] 11.26 23.95 7.09
OCNet[15] 31.71 60.48 81.36
SUIMNet[13] 27.69 3.86 4.59
LEDNet[19] 111.73 0.92 1.78
BiseNetv2[21] 244.63 3.35 3.83
ENet[14] 117.41 0.35 0.77
ERFNet[16] 198.36 2.06 4.64
CGNet[17] 116.49 0.48 1.08
表 5  各网络的效率指标对比结果
池化金字塔模块 特征融合模块 图像预处理 辅助边缘损失函数 mIoU/%
50.91
52.03
51.45
52.26
52.66
53.55
表 6  SUIM数据集消融实验精度指标的对比结果
基础网络 v/(帧·s?1) mIoU/%
Mobilenetv2 213.29 51.66
ResNet-18 199.27 53.90
本文方法(对称) 126.12 54.23
本文方法(非对称) 258.94 53.55
表 7  基础网络消融实验不同指标的对比结果
IoULoss CELoss OHEMCELoss BCELoss mIoU/%
49.91
51.31
52.66
53.55
表 8  损失函数消融实验精度指标的对比结果
α mIoU/% α mIoU/%
0 52.66 0.20 52.94
0.05 53.05 0.25 52.88
0.10 53.55 0.30 52.46
0.15 53.31
表 9  平衡参数α消融实验精度指标的对比结果
图 8  所提网络有、无边缘损失函数的对比结果
1 LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Santiago: IEEE, 2015: 3431-3440.
2 BADRINARAYANAN V, KENDALL A, CIPOLLA R Segnet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495
doi: 10.1109/TPAMI.2016.2644615
3 RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation [C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
4 CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs [EB/OL]. [2014-12-22]. https://arxiv.org/abs/1412.7062.
5 CHEN L C, PAPANDREOU G, KOKKINOS I, et al Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848
6 CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. [2017-06-17]. https://arxiv.org/abs/1706.05587.
7 ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2881-2890.
8 周登文, 田金月, 马路遥, 等 基于多级特征并联的轻量级图像语义分割[J]. 浙江大学学报: 工学版, 2020, 54 (8): 1516- 1524
ZHOU Deng-wen, TIAN Jin-yue, MA Lu-yao, et al Lightweight image semantic segmentation based on multi-level feature cascaded network[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (8): 1516- 1524
9 LIU F, FANG M Semantic segmentation of underwater images based on improved Deeplab[J]. Journal of Marine Science and Engineering, 2020, 8 (3): 188
doi: 10.3390/jmse8030188
10 ZHOU J, WEI X, SHI J, et al Underwater image enhancement via two-level wavelet decomposition maximum brightness color restoration and edge refinement histogram stretching[J]. Optics Express, 2022, 30 (10): 17290- 17306
doi: 10.1364/OE.450858
11 ZHOU J, WANG Y, ZHANG W, et al Underwater image restoration via feature priors to estimate background light and optimized transmission map[J]. Optics Express, 2021, 29 (18): 28228- 28245
doi: 10.1364/OE.432900
12 ZHOU J, YANG T, REN W, et al Underwater image restoration via depth map and illumination estimation based on a single image[J]. Optics Express, 2021, 29 (19): 29864- 29886
doi: 10.1364/OE.427839
13 ISLAM M J, EDGE C, XIAO Y, et al. Semantic segmentation of underwater imagery: dataset and benchmark [C]// 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas: IEEE, 2020: 1769-1776.
14 PASZKE A, CHAURASIA A, KIM S, et al. Enet: a deep neural network architecture for real-time semantic segmentation [EB/OL]. [2016-06-07]. https://arxiv.org/abs/1606.02147.
15 YUAN Y, HUANG L, GUO J, et al OCNet: object context for semantic segmentation[J]. International Journal of Computer Vision, 2021, 129 (8): 2375- 2398
doi: 10.1007/s11263-021-01465-9
16 ROMERA E, ALVAREZ J M, BERGASA L M, et al ERFNet: efficient residual factorized convnet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 19 (1): 263- 272
17 WU T, TANG S, ZHANG R, et al CGNet: a light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2020, 30: 1169- 1179
18 LI H, XIONG P, FAN H, et al. DFANet: deep feature aggregation for real-time semantic segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9522-9531.
19 WANG Y, ZHOU Q, LIU J, et al. LEDNet: a lightweight encoder-decoder network for real-time semantic segmentation [C]// 2019 IEEE International Conference on Image Processing. Taipei: IEEE, 2019: 1860-1864.
20 YU C, WANG J, PENG C, et al. Bisenet: bilateral segmentation network for real-time semantic segmentation [C]// Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 325-341.
21 YU C, GAO C, WANG J, et al Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129 (11): 3051- 3068
doi: 10.1007/s11263-021-01515-2
22 REUS G, MÖLLER T, JÄGER J, et al. Looking for seagrass: deep learning for visual coverage estimation [C]// 2018 OCEANS-MTS/IEEE Kobe Techno-Oceans. Kobe: IEEE, 2018: 1-6.
23 SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: inverted residuals and linear bottlenecks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520.
24 PENG C, ZHANG X, YU G, et al. Large kernel matters-improve semantic segmentation by global convolutional network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4353-4361.
[1] 张海波,蔡磊,任俊平,王汝言,刘富. 基于Transformer的高效自适应语义分割网络[J]. 浙江大学学报(工学版), 2023, 57(6): 1205-1214.
[2] 张剑钊,郭继昌,汪昱东. 基于融合逆透射率图的水下图像增强算法[J]. 浙江大学学报(工学版), 2023, 57(5): 921-929.
[3] 熊帆,陈田,卞佰成,刘军. 基于卷积循环神经网络的芯片表面字符识别[J]. 浙江大学学报(工学版), 2023, 57(5): 948-956.
[4] 杨长春,叶赞挺,刘半藤,王柯,崔海东. 基于多源信息融合的医学图像分割方法[J]. 浙江大学学报(工学版), 2023, 57(2): 226-234.
[5] 吴泽康,赵姗,李宏伟,姜懿芮. 遥感图像语义分割空间全局上下文信息网络[J]. 浙江大学学报(工学版), 2022, 56(4): 795-802.
[6] 温佩芝,陈君谋,肖雁南,温雅媛,黄文明. 基于生成式对抗网络和多级小波包卷积网络的水下图像增强算法[J]. 浙江大学学报(工学版), 2022, 56(2): 213-224.
[7] 蒋昊,徐海松. 基于直方图与图像分块融合的阶调映射算法[J]. 浙江大学学报(工学版), 2022, 56(11): 2224-2231.
[8] 陈彤,郭剑锋,韩心中,谢学立,席建祥. 基于生成对抗模型的可见光-红外图像匹配方法[J]. 浙江大学学报(工学版), 2022, 56(1): 63-74.
[9] 雍子叶,郭继昌,李重仪. 融入注意力机制的弱监督水下图像增强算法[J]. 浙江大学学报(工学版), 2021, 55(3): 555-562.
[10] 徐铸业,赵小强,蒋红梅. 基于点分布模型的3D模型拟合方法[J]. 浙江大学学报(工学版), 2021, 55(12): 2373-2381.
[11] 周登文,田金月,马路遥,孙秀秀. 基于多级特征并联的轻量级图像语义分割[J]. 浙江大学学报(工学版), 2020, 54(8): 1516-1524.
[12] 李瑛,成芳,赵志林. 采用结构光的大跨度销孔加工精度在线测量[J]. 浙江大学学报(工学版), 2020, 54(3): 557-565.
[13] 王万良,杨小涵,赵燕伟,高楠,吕闯,张兆娟. 采用卷积自编码器网络的图像增强算法[J]. 浙江大学学报(工学版), 2019, 53(9): 1728-1740.
[14] 周昊, 李宁, 李源, 赵梦豪, 岑可法. 富氧条件下乙醇喷雾燃烧特性的实验研究[J]. 浙江大学学报(工学版), 2018, 52(9): 1821-1827.
[15] 张承志, 冯华君, 徐之海, 李奇, 陈跃庭. 图像噪声方差分段估计法[J]. 浙江大学学报(工学版), 2018, 52(9): 1804-1810.