Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (9): 1796-1805    DOI: 10.3785/j.issn.1008-973X.2022.09.013
计算机与控制工程     
融合多尺度和多头注意力的医疗图像分割方法
王万良1,2(),王铁军1,陈嘉诚1,尤文波1
1. 浙江工业大学 计算机科学与技术学院,浙江 杭州 310023
2. 浙江树人大学 信息科技学院,浙江 杭州 310015
Medical image segmentation method combining multi-scale and multi-head attention
Wan-liang WANG1,2(),Tie-jun WANG1,Jia-cheng CHEN1,Wen-bo YOU1
1. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
2. College of Information Science and Technology, Zhejiang Shuren University, Hangzhou 310015, China
 全文: PDF(1159 KB)   HTML
摘要:

为了从医疗图像中自动且准确地提取兴趣区域, 提出基于神经网络的分割模型MS2Net. 针对传统卷积操作缺乏获取长距离依赖关系能力的问题, 为了更好提取上下文信息, 提出融合卷积和Transformer的架构. 基于Transformer的上下文抽取模块通过多头自注意力得到像素间相似度关系, 基于相似度关系融合各像素特征使网络拥有全局视野, 使用相对位置编码使Transformer保留输入特征图的结构信息. 为了使网络适应兴趣区域形态的差异, 在MS2Net中应用解码端多尺度特征并提出多尺度注意力机制. 对多尺度特征图依次应用分组通道和分组空间注意力, 使网络自适应地选取合理的多尺度语义信息. MS2Net在数据集ISBI 2017和CVC-ColonDB上均取得较U-Net、CE-Net、DeepLab v3+、UTNet等先进方法更优的交并比指标, 有着较好的泛化能力.

关键词: 医疗图像分割深度学习注意力Transformer多尺度    
Abstract:

A neural network based segmentation model MS2Net was proposed to automatically and accurately extract regions of interest from medical images. In order to better extract context information, a network architecture combining convolution and Transformer was proposed, which solved the problem that traditional convolution operations lacked the ability to acquire long-range dependencies. In the Transformer-based context extraction module, multi-head self-attention was used to obtain the similarity relationship between pixels. Based on the similarity relationship, the features of each pixel were fused, so that the network had a global view, while the relative positional encoding enabled Transformer to retain the structural information of an input feature map. Aiming at making the network adapt to different sizes of regions of interest, the multi-scale features of decoders were used by MS2Net and a multi-scale attention mechanism was proposed. The group channel attention and the group spatial attention were applied to a multi-scale feature map in turns, so that the reasonable multi-scale semantic information was selected adaptively by the network. MS2Net had achieved better intersection-over-union than advanced methods such as U-Net, CE-Net, DeepLab v3+, UTNet on both ISBI 2017 and CVC-ColonDB datasets, which reflected its excellent generalization ability.

Key words: medical image segmentation    deep learning    attention    Transformer    multi-scale
收稿日期: 2021-08-05 出版日期: 2022-09-28
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(61873240)
作者简介: 王万良(1957—),男,教授,从事大数据、深度学习与智能调度研究. orcid.org/0000-0002-1552-5075. E-mail: zjutwwl@zjut.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
王万良
王铁军
陈嘉诚
尤文波

引用本文:

王万良,王铁军,陈嘉诚,尤文波. 融合多尺度和多头注意力的医疗图像分割方法[J]. 浙江大学学报(工学版), 2022, 56(9): 1796-1805.

Wan-liang WANG,Tie-jun WANG,Jia-cheng CHEN,Wen-bo YOU. Medical image segmentation method combining multi-scale and multi-head attention. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1796-1805.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.09.013        https://www.zjujournals.com/eng/CN/Y2022/V56/I9/1796

图 1  MS2Net框架图与局部模块放大图
图 2  多尺度注意力机制
%
组合方式 JA DI ACC SEN SPE TJI
BN 77.27 85.71 93.65 84.70 96.27 68.26
BN+MSA 78.08 86.19 93.97 85.95 96.17 69.85
BN+CE 77.91 86.00 93.77 84.63 96.67 70.01
MS2Net 78.43 86.28 93.81 85.96 96.72 71.60
表 1  不同模块组合方式对分割性能的影响
%
预训练 JA DI ACC SEN
77.97 85.98 93.45 85.73
78.43 86.28 93.81 85.96
表 2  预训练对分割性能的影响
方式 JA/% DI/% ACC/% SEN/% NP/106 FLOPs/109
堆叠 77.83 85.96 93.79 84.55 22.47 17.59
相加 78.43 86.28 93.81 85.96 21.66 16.96
表 3  跳跃连接时特征图融合方式对分割性能的影响
%
T JA DI ACC SEN
1 77.97 86.03 93.80 85.42
2 78.08 86.19 93.67 85.56
4 78.43 86.28 93.81 85.96
8 78.19 86.17 93.80 86.18
表 4  注意力头数为32时不同Transformer块数对分割指标的影响
%
N JA DI ACC SEN
1 77.53 85.69 93.65 85.42
2 77.75 85.91 93.58 85.33
4 77.84 85.91 93.83 84.97
8 77.88 85.85 93.24 86.24
16 78.11 86.22 93.93 85.99
32 78.43 86.28 93.81 85.96
表 5  Transformer块数为4时不同注意力头数对分割指标的影响
%
相对位置编码 JA DI ACC SEN
不采用 77.86 85.94 93.76 84.50
采用 78.43 86.28 93.81 85.96
表 6  相对编码对分割性能的影响
%
GCA GSA JA DI ACC SEN
77.75 85.87 93.65 84.53
78.13 86.15 93.54 85.72
77.88 85.94 93.49 85.89
78.43 86.28 93.81 85.96
表 7  分组通道注意力和分组空间注意力在分割指标上的作用对比
图 3  不同病灶图像的注意力权重可视化图
%
方式 JA DI ACC SEN
直接融合 77.75 85.87 93.65 84.53
SE 77.59 85.79 93.48 85.25
CBAM 78.02 86.03 93.62 85.74
MS-Dual-Guided 77.83 85.80 93.60 85.86
MSA 78.43 86.28 93.81 85.96
表 8  不同融合方式的各项分割指标对比
%
方法 JA DI ACC SEN SPE TJI
U-Net 72.81 81.78 92.23 80.36 97.33 60.83
Attention U-Net 72.93 81.89 92.10 81.72 96.97 61.27
Swin-Unet 66.04 75.61 90.46 79.11 93.81 52.58
RAUNet 77.26 85.49 93.68 83.48 97.50 69.47
SFUNet 76.15 84.57 93.38 82.98 96.83 67.01
DeepLab v3+
(Xception)
77.37 85.67 93.61 83.96 96.90 69.10
CE-Net 77.46 85.43 93.68 83.84 97.12 70.49
CA-Net 77.16 85.38 93.13 85.80 95.53 68.56
UTNet 77.47 85.51 93.55 87.15 95.48 70.10
MS-Dual-Guided 76.48 84.72 92.65 87.11 94.54 68.17
MS2Net 78.43 86.28 93.81 85.96 96.72 71.60
表 9  不同算法在ISBI 2017数据集上的分割性能对比
%
方法 JA DI ACC SEN
U-Net 76.70±1.73 83.97±1.71 97.85±0.19 84.12±3.10
Attention U-Net 76.71±3.00 83.65±2.95 97.98±0.40 83.61±3.35
Swin-Unet 34.36±2.62 44.84±2.92 92.05±0.65 54.79±2.43
RAUNet 82.41±1.79 88.81±1.89 98.63±0.20 89.12±1.71
SFUNet 80.12±0.79 87.23±0.81 98.32±0.13 87.86±1.77
DeepLab v3+
(Xception)
79.34±1.24 85.61±1.41 98.52±0.10 86.19±1.38
CE-Net 81.71±1.65 87.95±1.77 98.50±0.08 88.79±2.08
CA-Net 76.01±1.60 83.73±1.63 97.57±0.29 85.35±1.93
UTNet 78.39±2.32 85.51±2.30 98.20±0.21 85.96±2.68
MS2Net 82.83±1.71 89.19±1.89 98.52±0.22 89.68±1.95
表 10  不同算法在CVC-ColonDB数据集上的分割性能对比
图 4  不同方法的医疗图像分割结果对比
1 LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3431-3440.
2 RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation [C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich: Springer, 2015: 234-241.
3 HE K, ZHANG X, REN S, et al Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37 (9): 1904- 1916
4 ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2881-2890.
5 GU Z, CHENG J, FU H, et al CE-Net: context encoder network for 2D medical image segmentation[J]. IEEE Transactions on Medical Imaging, 2019, 38 (10): 2281- 2292
doi: 10.1109/TMI.2019.2903562
6 WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City: IEEE, 2018: 7794-7803.
7 XING X, YUAN Y, MENG M Q Zoom in lesions for better diagnosis: attention guided deformation network for WCE image classification[J]. IEEE Transactions on Medical Imaging, 2020, 39 (12): 4047- 4059
doi: 10.1109/TMI.2020.3010102
8 LIU R, LIU M, SHENG B, et al NHBS-Net: a feature fusion attention network for ultrasound neonatal hip bone segmentation[J]. IEEE Transactions on Medical Imaging, 2021, 40 (12): 3446- 3458
doi: 10.1109/TMI.2021.3087857
9 CHEN L C, PAPANDREOU G, KOKKINOS I, et al DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848
10 CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation [EB/OL]. [2021-08-01]. https://arxiv.org/abs/1706.05587.
11 SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-V4, inception-ResNet and the impact of residual connections on learning [C]// Thirty-first AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 4278-4284.
12 ZHU X, HU H, LIN S, et al. Deformable convNets V2: more deformable, better results [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9308-9316.
13 WANG X, JIANG X, DING H, et al Bi-directional dermoscopic feature learning and multi-scale consistent decision fusion for skin lesion segmentation[J]. IEEE Transactions on Image Processing, 2019, 29: 3039- 3051
14 HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. Salt Lake City: IEEE, 2018: 7132-7141.
15 高颖琪, 郭松, 李宁, 等 语义融合眼底图像动静脉分类方法[J]. 中国图象图形学报, 2020, 25 (10): 2259- 2270
GAO Ying-qi, GUO Song, LI Ning, et al Arteriovenous classification method in fundus images based on semantic fusion[J]. Journal of Image and Graphics, 2020, 25 (10): 2259- 2270
doi: 10.11834/jig.200187
16 NI Z L, BIAN G B, ZHOU X H, et al. RAUNet: residual attention U-Net for semantic segmentation of cataract surgical instruments [C]// International Conference on Neural Information Processing. Shenzhen: Springer, 2019: 139-149.
17 OKTAY O, SCHLEMPER J, FOLGOC L L, et al. Attention U-Net: learning where to look for the pancreas [EB/OL]. [2021-08-01]. https://arxiv.org/abs/1804.03999.
18 PARK J, WOO S, LEE J Y, et al. BAM: bottleneck attention module [EB/OL]. [2021-08-01]. https://arxiv.org/abs/1807.06514.
19 WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 3-19.
20 GU R, WANG G, SONG T, et al CA-Net: comprehensive attention convolutional neural networks for explainable medical image segmentation[J]. IEEE Transactions on Medical Imaging, 2020, 40 (2): 699- 711
21 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale [EB/OL]. [2021-08-01]. https://arxiv.org/abs/2010.11929.
22 LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10012-10022.
23 GUO J, HAN K, WU H, et al. CMT: convolutional neural networks meet vision transformers [EB/OL]. [2021-08-01].https://arxiv.org/abs/2107.06263.
24 CAO H, WANG Y, CHEN J, et al. Swin-Unet: Unet-like pure Transformer for medical image segmentation [EB/OL]. [2021-08-01]. https://arxiv.org/abs/2105.05537.
25 GAO Y, ZHOU M, METAXAS D. UTNet: a hybrid Transformer architecture for medical image segmentation [EB/OL]. [2021-08-01]. https://arxiv.org/abs/2107.00781.
26 CODELLA N C F, GUTMAN D, CELEBI M E, et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC) [C]// 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018). Washington DC: IEEE, 2018: 168-172.
27 BERNAL J, SANCHEZ J, VILARINO F Towards automatic polyp detection with a polyp appearance model[J]. Pattern Recognition, 2012, 45 (9): 3166- 3182
doi: 10.1016/j.patcog.2012.03.002
28 HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
29 WU H, PAN J, LI Z, et al Automated skin lesion segmentation via an adaptive dual attention module[J]. IEEE Transactions on Medical Imaging, 2020, 40 (1): 357- 370
[1] 刘近贞,陈飞,熊慧. 多尺度残差网络模型的开放式电阻抗成像算法[J]. 浙江大学学报(工学版), 2022, 56(9): 1789-1795.
[2] 郝琨,王阔,王贝贝. 基于改进Mobilenet-YOLOv3的轻量级水下生物检测算法[J]. 浙江大学学报(工学版), 2022, 56(8): 1622-1632.
[3] 夏杰锋,唐武勤,杨强. 光伏航拍红外图像的热斑自动检测方法[J]. 浙江大学学报(工学版), 2022, 56(8): 1640-1647.
[4] 莫仁鹏,司小胜,李天梅,朱旭. 基于多尺度特征与注意力机制的轴承寿命预测[J]. 浙江大学学报(工学版), 2022, 56(7): 1447-1456.
[5] 赵永胜,李瑞祥,牛娜娜,赵志勇. 数字孪生驱动的机身形状控制方法[J]. 浙江大学学报(工学版), 2022, 56(7): 1457-1463.
[6] 金国美,佘青山,张敏,马玉良,张建海,孙明旭. 多尺度补偿传递熵的皮层肌肉功能耦合方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1152-1158, 1256.
[7] 鞠晓臣,赵欣欣,钱胜胜. 基于自注意力机制的桥梁螺栓检测算法[J]. 浙江大学学报(工学版), 2022, 56(5): 901-908.
[8] 何立,庞善民. 结合年龄监督和人脸先验的语音-人脸图像重建[J]. 浙江大学学报(工学版), 2022, 56(5): 1006-1016.
[9] 陈成,张皞,李永强,冯远静. 关系生成图注意力网络的知识图谱链接预测[J]. 浙江大学学报(工学版), 2022, 56(5): 1025-1034.
[10] 王友卫,童爽,凤丽洲,朱建明,李洋,陈福. 基于图卷积网络的归纳式微博谣言检测新方法[J]. 浙江大学学报(工学版), 2022, 56(5): 956-966.
[11] 张雪芹,李天任. 基于Cycle-GAN和改进DPN网络的乳腺癌病理图像分类[J]. 浙江大学学报(工学版), 2022, 56(4): 727-735.
[12] 许萌,王丹,李致远,陈远方. IncepA-EEGNet: 融合Inception网络和注意力机制的P300信号检测方法[J]. 浙江大学学报(工学版), 2022, 56(4): 745-753, 782.
[13] 卞艳,宫雨生,马国鹏,王昶. 基于无人机遥感影像的水体提取方法[J]. 浙江大学学报(工学版), 2022, 56(4): 764-774.
[14] 柳长源,何先平,毕晓君. 融合注意力机制的高效率网络车型识别[J]. 浙江大学学报(工学版), 2022, 56(4): 775-782.
[15] 陈巧红,裴皓磊,孙麒. 基于视觉关系推理与上下文门控机制的图像描述[J]. 浙江大学学报(工学版), 2022, 56(3): 542-549.