Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (7): 1278-1286    DOI: 10.3785/j.issn.1008-973X.2023.07.002
    
Lightweight semantic segmentation network for underwater image
Hao-ran GUO(),Ji-chang GUO*(),Yu-dong WANG
School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
Download: HTML     PDF(2385KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A semantic segmentation network was designed for underwater images. A lightweight and efficient encoder-decoder architecture was used by considering the trade-off between speed and accuracy. Inverted bottleneck layer and pyramid pooling module were designed in the encoder part to efficiently extract features. Feature fusion module was constructed in the decoder part in order to fuse multi-level features, which improved the segmentation accuracy. Auxiliary edge loss function was used to train the network better aiming at the problem of fuzzy edges of underwater images, and the edges of segmentation were refined through the supervision of semantic boundaries. The experimental data on the underwater semantic segmentation dataset SUIM show that the network achieves 53.55% mean IoU with an inference speed of 258.94 frames per second on one NVIDIA GeForce GTX 1080 Ti card for the input image of pixel 320×256, which can achieve real-time processing speed while maintaining high accuracy.



Key wordsimage processing      underwater image      semantic segmentation      edge feature      lightweight network     
Received: 27 July 2022      Published: 17 July 2023
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(62171315)
Corresponding Authors: Ji-chang GUO     E-mail: 2568971284@qq.com;jcguo@tju.edu.cn
Cite this article:

Hao-ran GUO,Ji-chang GUO,Yu-dong WANG. Lightweight semantic segmentation network for underwater image. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1278-1286.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.07.002     OR     https://www.zjujournals.com/eng/Y2023/V57/I7/1278


面向水下场景的轻量级图像语义分割网络

提出面向水下场景的图像语义分割网络,考虑到速度和准确度之间的权衡问题,网络采用轻量且高效的编解码器结构. 在编码器部分,设计倒置瓶颈层和金字塔池化模块,高效地提取特征. 在解码器部分,构建特征融合模块融合多水平特征,提升了分割的准确度. 针对水下图像边缘模糊的问题,使用辅助的边缘损失函数来更好地训练网络,通过语义边界的监督细化分割的边缘. 在水下语义分割数据集SUIM上的实验数据表明,对于320像素×256像素的输入图像,该网络在NVIDIA GeForce GTX 1080Ti显卡上的推理速度达到258.94帧/s,mIoU达到53.55%,能够在保证高准确度的同时,达到实时的处理速度.


关键词: 图像处理,  水下图像,  语义分割,  边缘特征,  轻量级网络 
Fig.1 Overall architecture of lightweight semantic segmentation network for underwater image
模块 模块类型 输出尺寸(W×H×C)
模块1-1
模块1-2
3×3卷积(s = 2)
3×3卷积(s = 2)
160×128×12
80×64×12
模块2-1
模块2-2
倒置瓶颈层(r = 3,s = 2)
倒置瓶颈层(r = 6,s = 1)
40×32×24
40×32×24
模块3-1
模块3-2
倒置瓶颈层(r = 3,s = 2)
倒置瓶颈层(r = 6,s = 1)
20×16×48
20×16×48
模块4-1
模块4-2
模块4-3
模块4-4
倒置瓶颈层(r = 3,s = 2)
倒置瓶颈层(r = 6,s = 1)
倒置瓶颈层(r = 12,s = 1)
倒置瓶颈层(r = 18,s = 1)
10×8×96
10×8×96
10×8×96
10×8×96
模块5 池化金字塔模块 10×8×48
Tab.1 Encoder composition of proposed network
Fig.2 Architectures of inverted bottleneck layer and pyramid pooling module
模块 模块类型 输出尺寸(W×H×C)
第1阶段
第2阶段
特征融合模块
特征融合模块
20×16×48
40×32×24
第3阶段 上采样(8倍) 320×256×N
Tab.2 Decoder composition of proposed network
Fig.3 Architecture of feature fusion module
Fig.4 Edge feature extractions of Ground Truth and segmentation results
Fig.5 Experimental results on SUIM dataset of proposed network compared with classical network
Fig.6 Experimental results on seagrass dataset of proposed network compared with classical network
Fig.7 Experimental failure cases on SUIM dataset of proposed network
语义分割模型 IoU/% mIoU/% PA/%
BW HD PF WR RO RI FV SR
本文方法 84.62 63.99 18.46 41.84 61.93 53.44 46.00 58.42 53.55 85.32
U-Net[3] 79.46 32.25 21.85 33.94 23.65 50.28 38.16 42.16 39.85 79.44
SegNet[2] 80.63 45.67 17.45 32.24 55.72 47.62 43.92 51.51 46.85 82.19
Deeplab[4] 81.82 50.26 17.05 43.33 63.60 57.18 43.59 55.35 51.52 84.27
PSPNet[7] 82.51 65.04 28.54 46.56 62.88 55.80 46.78 55.98 55.51 86.41
GCN[24] 79.32 38.57 15.09 30.38 54.25 49.94 36.09 52.02 44.46 81.28
OCNet[15] 83.14 64.03 24.31 43.11 61.78 54.92 47.41 54.97 54.30 85.89
SUIMNet[13] 80.64 63.45 23.27 41.25 60.89 53.12 46.02 57.12 53.22 85.22
LEDNet[19] 82.96 58.47 18.02 42.86 50.96 58.13 46.13 54.99 51.36 84.25
BiseNetv2[21] 83.67 59.29 18.27 39.58 56.54 58.16 47.33 56.93 52.47 84.96
ENet[14] 80.94 50.60 16.97 36.71 51.73 49.24 41.99 50.46 47.33 82.31
ERFNet[16] 83.02 52.95 17.50 41.72 49.80 53.70 45.98 54.30 50.40 83.75
CGNet[17] 81.21 60.04 17.71 42.91 53.62 57.62 46.46 53.71 51.66 83.99
Tab.3 Comparison results of accuracy index on SUIM dataset in each network
语义分割模型 mIoU/% PA/%
0~2 m 2~6 m 0~2 m 2~6 m
本文方法 88.63 89.01 96.08 96.10
U-Net[3] 87.69 87.42 95.89 95.62
SegNet[2] 83.90 82.93 94.96 94.92
Deeplab[4] 87.36 87.93 95.84 95.88
PSPNet[7] 89.08 89.29 96.31 96.33
GCN[24] 87.37 86.97 95.82 95.73
OCNet[15] 88.96 89.41 96.26 96.35
SUIMNet[13] 88.24 88.45 95.91 95.93
LEDNet[29] 87.48 87.84 95.85 95.88
BiseNetv2[21] 88.43 88.85 96.03 96.09
ENet[14] 85.94 86.60 95.17 95.21
ERFNet[16] 86.72 87.05 95.36 95.48
CGNet[27] 87.15 87.24 95.43 95.46
Tab.4 Comparison results of accuracy index in each network on seagrass dataset
语义分割模型 v/(帧·s?1) p/106 f/109
本文方法 258.94 1.45 0.31
U-Net[3] 19.98 14.39 38.79
SegNet[2] 17.52 28.44 61.39
Deeplab[4] 16.00 5.81 8.28
PSPNet[7] 6.65 27.50 49.78
GCN[24] 11.26 23.95 7.09
OCNet[15] 31.71 60.48 81.36
SUIMNet[13] 27.69 3.86 4.59
LEDNet[19] 111.73 0.92 1.78
BiseNetv2[21] 244.63 3.35 3.83
ENet[14] 117.41 0.35 0.77
ERFNet[16] 198.36 2.06 4.64
CGNet[17] 116.49 0.48 1.08
Tab.5 Comparison results of efficiency index in each network
池化金字塔模块 特征融合模块 图像预处理 辅助边缘损失函数 mIoU/%
50.91
52.03
51.45
52.26
52.66
53.55
Tab.6 Comparison results of accuracy indicators for ablation experiments on SUIM dataset
基础网络 v/(帧·s?1) mIoU/%
Mobilenetv2 213.29 51.66
ResNet-18 199.27 53.90
本文方法(对称) 126.12 54.23
本文方法(非对称) 258.94 53.55
Tab.7 Comparison results of different indexes in baseline network ablation experiments
IoULoss CELoss OHEMCELoss BCELoss mIoU/%
49.91
51.31
52.66
53.55
Tab.8 Comparison results of accuracy index in loss function ablation experiments
α mIoU/% α mIoU/%
0 52.66 0.20 52.94
0.05 53.05 0.25 52.88
0.10 53.55 0.30 52.46
0.15 53.31
Tab.9 Comparison results of accuracy index in balance parameter α ablation experiments
Fig.8 Comparison results of proposed network with or without edge loss function
[1]   LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Santiago: IEEE, 2015: 3431-3440.
[2]   BADRINARAYANAN V, KENDALL A, CIPOLLA R Segnet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495
doi: 10.1109/TPAMI.2016.2644615
[3]   RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation [C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241.
[4]   CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs [EB/OL]. [2014-12-22]. https://arxiv.org/abs/1412.7062.
[5]   CHEN L C, PAPANDREOU G, KOKKINOS I, et al Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848
[6]   CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. [2017-06-17]. https://arxiv.org/abs/1706.05587.
[7]   ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2881-2890.
[8]   周登文, 田金月, 马路遥, 等 基于多级特征并联的轻量级图像语义分割[J]. 浙江大学学报: 工学版, 2020, 54 (8): 1516- 1524
ZHOU Deng-wen, TIAN Jin-yue, MA Lu-yao, et al Lightweight image semantic segmentation based on multi-level feature cascaded network[J]. Journal of Zhejiang University: Engineering Science, 2020, 54 (8): 1516- 1524
[9]   LIU F, FANG M Semantic segmentation of underwater images based on improved Deeplab[J]. Journal of Marine Science and Engineering, 2020, 8 (3): 188
doi: 10.3390/jmse8030188
[10]   ZHOU J, WEI X, SHI J, et al Underwater image enhancement via two-level wavelet decomposition maximum brightness color restoration and edge refinement histogram stretching[J]. Optics Express, 2022, 30 (10): 17290- 17306
doi: 10.1364/OE.450858
[11]   ZHOU J, WANG Y, ZHANG W, et al Underwater image restoration via feature priors to estimate background light and optimized transmission map[J]. Optics Express, 2021, 29 (18): 28228- 28245
doi: 10.1364/OE.432900
[12]   ZHOU J, YANG T, REN W, et al Underwater image restoration via depth map and illumination estimation based on a single image[J]. Optics Express, 2021, 29 (19): 29864- 29886
doi: 10.1364/OE.427839
[13]   ISLAM M J, EDGE C, XIAO Y, et al. Semantic segmentation of underwater imagery: dataset and benchmark [C]// 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems. Las Vegas: IEEE, 2020: 1769-1776.
[14]   PASZKE A, CHAURASIA A, KIM S, et al. Enet: a deep neural network architecture for real-time semantic segmentation [EB/OL]. [2016-06-07]. https://arxiv.org/abs/1606.02147.
[15]   YUAN Y, HUANG L, GUO J, et al OCNet: object context for semantic segmentation[J]. International Journal of Computer Vision, 2021, 129 (8): 2375- 2398
doi: 10.1007/s11263-021-01465-9
[16]   ROMERA E, ALVAREZ J M, BERGASA L M, et al ERFNet: efficient residual factorized convnet for real-time semantic segmentation[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 19 (1): 263- 272
[17]   WU T, TANG S, ZHANG R, et al CGNet: a light-weight context guided network for semantic segmentation[J]. IEEE Transactions on Image Processing, 2020, 30: 1169- 1179
[18]   LI H, XIONG P, FAN H, et al. DFANet: deep feature aggregation for real-time semantic segmentation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9522-9531.
[19]   WANG Y, ZHOU Q, LIU J, et al. LEDNet: a lightweight encoder-decoder network for real-time semantic segmentation [C]// 2019 IEEE International Conference on Image Processing. Taipei: IEEE, 2019: 1860-1864.
[20]   YU C, WANG J, PENG C, et al. Bisenet: bilateral segmentation network for real-time semantic segmentation [C]// Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 325-341.
[21]   YU C, GAO C, WANG J, et al Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129 (11): 3051- 3068
doi: 10.1007/s11263-021-01515-2
[22]   REUS G, MÖLLER T, JÄGER J, et al. Looking for seagrass: deep learning for visual coverage estimation [C]// 2018 OCEANS-MTS/IEEE Kobe Techno-Oceans. Kobe: IEEE, 2018: 1-6.
[23]   SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: inverted residuals and linear bottlenecks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4510-4520.
[24]   PENG C, ZHANG X, YU G, et al. Large kernel matters-improve semantic segmentation by global convolutional network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4353-4361.
[1] Hai-bo ZHANG,Lei CAI,Jun-ping REN,Ru-yan WANG,Fu LIU. Efficient and adaptive semantic segmentation network based on Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1205-1214.
[2] Jian-zhao ZHANG,Ji-chang GUO,Yu-dong WANG. Underwater image enhancement algorithm via fusing reverse medium transmission map[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 921-929.
[3] Fan XIONG,Tian CHEN,Bai-cheng BIAN,Jun LIU. Chip surface character recognition based on convolutional recurrent neural network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 948-956.
[4] Ze-kang WU,Shan ZHAO,Hong-wei LI,Yi-rui JIANG. Spatial global context information network for semantic segmentation of remote sensing image[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 795-802.
[5] Pei-zhi WEN,Jun-mou CHEN,Yan-nan XIAO,Ya-yuan WEN,Wen-ming HUANG. Underwater image enhancement algorithm based on GAN and multi-level wavelet CNN[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 213-224.
[6] Hao JIANG,Hai-song XU. Histogram based tone mapping algorithm using image segmentation and fusion[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2224-2231.
[7] Tong CHEN,Jian-feng GUO,Xin-zhong HAN,Xue-li XIE,Jian-xiang XI. Visible and infrared image matching method based on generative adversarial model[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(1): 63-74.
[8] Zi-ye YONG,Ji-chang GUO,Chong-yi LI. weakly supervised underwater image enhancement algorithm incorporating attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(3): 555-562.
[9] Zhu-ye XU,Xiao-qiang ZHAO,Hong-mei JIANG. 3D model fitting method based on point distribution model[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2373-2381.
[10] Deng-wen ZHOU,Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN. Lightweight image semantic segmentation based on multi-level feature cascaded network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1516-1524.
[11] Ying LI,Fang CHENG,Zhi-lin ZHAO. Machining precision online measurement of large span pin hole using structured light[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 557-565.
[12] Wan-liang WANG,Xiao-han YANG,Yan-wei ZHAO,Nan GAO,Chuang LV,Zhao-juan ZHANG. Image enhancement algorithm with convolutional auto-encoder network[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(9): 1728-1740.
[13] ZHOU Hao, LI Ning, LI Yuan, ZHAO Meng-hao, CEN Ke-fa. Experimental study on ethanol spray combustion characteristics under oxy-fuel conditions[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(9): 1821-1827.
[14] ZHANG Cheng-zhi, FENG Hua-jun, XU Zhi-hai, LI Qi, CHEN Yue-ting. Piecewise noise variance estimation of images based on wavelet transform[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(9): 1804-1810.
[15] ZHOU Jia-li, CHEN Yi-jun, WU Min. Image acquisition and preprocessing method based on FPGA monitor[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(2): 398-405.