Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (5): 865-874    DOI: 10.3785/j.issn.1008-973X.2023.05.002
计算机技术与控制工程     
基于改进Transformer的结构化图像超分辨网络
吕鑫栋(),李娇,邓真楠,冯浩,崔欣桐,邓红霞*()
太原理工大学 信息与计算机学院,山西 太原 030024
Structured image super-resolution network based on improved Transformer
Xin-dong LV(),Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG*()
College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
 全文: PDF(1744 KB)   HTML
摘要:

针对现有的结构化图像超分辨重建算法大多只能解决特定单一种类的结构化图像超分辨问题,提出一种基于改进Transformer的结构化图像超分辨率网络(TransSRNet). 该网络利用Transformer的自注意力机制在空间序列中挖掘大范围的全局信息. 采用沙漏块结构搭建空间注意力单元,关注低分辨率空间和高分辨率空间在局部区域的映射关系,提取图像映射过程中的结构化信息,使用高效通道注意力模块对自注意力模块和空间注意力模块做特征融合. 在高度结构化CelebA、Helen、TCGA-ESCA 和TCGA-COAD数据集上的模型评估结果表明,相较于主流超分辨算法,TransSRNet整体性能表现更好. 在放大因子为8时,人脸数据集和医学峰值信噪比(PRNR)可以分别达到28.726、26.392 dB, 结构相似性(SSIM)可以分别达到0.844、0.881.

关键词: 卷积神经网络Transformer自注意力空间注意力图像超分辨率重建    
Abstract:

Most of existing structural image super-resolution reconstruction algorithms can only solve a specific single type of structural image super-resolution problem. A structural image super-resolution network based on improved Transformer (TransSRNet) was proposed. The network used the self-attention mechanism of Transformer mine a wide range of global information in spatial sequences. A spatial attention unit was built by using the hourglass block structure. The mapping relationship between the low-resolution space and the high-resolution space in the local area was concerned. The structured information in the image mapping process was extracted. The channel attention module was used to fuse the features of the self-attention module and the spatial attention module. The TransSRNet was evaluated on highly-structured CelebA, Helen, TCGA-ESCA and TCGA-COAD datasets. Results of evaluation showed that the TransSRNet model had a better overall performance compared with the super-resolution algorithms. With a upscale factor of 8, the PSNR of the face dataset and the medical image dataset could reach 28.726 and 26.392 dB respectively, and the SSIM could reach 0.844 and 0.881 respectively.

Key words: convolutional neural network    Transformer    self-attention    spatial attention    image super-resolution reconstruction
收稿日期: 2022-07-25 出版日期: 2023-05-09
CLC:  TP 391  
基金资助: 山西省中央引导地方科技发展资金资助项目(YDZJSX2021C005,YDZJSX2022A016);2022年浙江大学CAD&CG国家重点实验室开放课题项目(A2221)
通讯作者: 邓红霞     E-mail: 865877436@qq.com;denghongxia@tyut.edu.cn
作者简介: 吕鑫栋(1997—),男,硕士生,从事图像超分辨重建研究. orcid.org/0000-0002-2109-6756. E-mail: 865877436@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
吕鑫栋
李娇
邓真楠
冯浩
崔欣桐
邓红霞

引用本文:

吕鑫栋,李娇,邓真楠,冯浩,崔欣桐,邓红霞. 基于改进Transformer的结构化图像超分辨网络[J]. 浙江大学学报(工学版), 2023, 57(5): 865-874.

Xin-dong LV,Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG. Structured image super-resolution network based on improved Transformer. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 865-874.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.05.002        https://www.zjujournals.com/eng/CN/Y2023/V57/I5/865

图 1  沙漏块结构图
图 2  TransSRNet结构图
图 3  空间注意力单元结构图
图 4  编码器和解码器结构图
图 5  残差Transformer块结构图
图 6  ECA模块结构图
图 7  不同空间注意力单元数量对PSNR、SSIM的影响
NT PSNR/dB SSIM
2 28.479 0.838
4 28.535 0.839
6 28.568 0.839
8 28.343 0.834
表 1  不同残差Transformer块数量对PSNR、SSIM的影响
实验 PSNR/dB SSIM
1 28.341 0.834
2 26.089 0.763
3 28.568 0.839
表 2  保留不同注意力模块对PSNR、SSIM的影响
损失函数 PSNR/dB SSIM
lpix 28.568 0.839
lpixlstyle联合 28.598 0.839
lpixlstylelssim联合 28.632 0.841
表 3  联合不同损失函数对PSNR、SSIM的影响
图 8  SE模块和ECA模块对PSNR、SSIM的影响
放大因子 Bicubic SRGAN FSRNet SPSR EIPNet TransSRNet(Our)
PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM
2 34.942, 0.955 35.831, 0.962 37.699, 0.971 37.729, 0.966 37.899, 0.972 38.930, 0.975
3 31.130, 0.901 32.400, 0.920 33.526, 0.935 33.268, 0.921 33.942, 0.940 35.227, 0.949
4 28.999, 0.850 30.034, 0.871 32.338, 0.916 30.580, 0.872 32.569, 0.919 33.215, 0.925
8 24.531, 0.698 25.278, 0.717 26.934, 0.795 25.579, 0.722 26.898, 0.791 28.726, 0.844
表 4  不同方法在Helen数据集上的对比结果
图 9  在Helen数据集上放大因子为2、3、4和8的主观效果对比图
放大因子 Bicubic SRGAN RNAN SPSR NLSN TransSRNet(Our)
PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM
2 30.111, 0.937 31.838, 0.942 36.223, 0.976 34.917, 0.969 36.514, 0.977 36.378, 0.976
3 27.310, 0.889 28.635, 0.909 31.764, 0.951 30.648, 0.941 31.840, 0.952 32.829, 0.956
4 25.751, 0.852 26.357, 0.875 29.467, 0.928 28.092, 0.914 29.552, 0.929 30.449, 0.936
8 22.872, 0.774 23.005, 0.802 24.546, 0.838 23.677, 0.828 24.597, 0.840 26.392, 0.881
表 5  不同方法在医学CT数据集上的对比结果
图 10  在医学CT数据集上放大因子为2的主观效果对比图
图 11  在医学CT数据集上放大因子为3的主观效果对比图
图 12  在医学CT数据集上放大因子为4的主观效果对比图
图 13  在医学CT数据集上放大因子为8的主观效果对比图
1 DONG C, LOY C C, HE K, et al. Learning a deep convolutional network for image super-resolution [C]// Proceedings of the European Conference on Computer Vision. Columbus: CVPR, 2014: 184-199.
2 LEDIG C, THEIS L, HUSZAR F, et al. Photo-realistic single image super-resolution using a generative adversarial network [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: CVPR, 2017: 105-114.
3 MA C, RAO Y, CHENG Y, et al. Structure-preserving super-resolution with gradient guidance [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: CVPR, 2020: 7766-7775.
4 ZHANG Y, LI K, LI K, et al. Residual non-local attention networks for image restoration [EB/OL]. [2019-03-24]. https://arxiv.org/pdf/1903.10082.pdf.
5 徐永兵, 袁东, 余大兵, 等 多注意力机制引导的双目图像超分辨率重建算法[J]. 电子测量技术, 2021, 44 (15): 103- 108
XU Yong-bing, YUAN Dong, YU Da-bing, et al Binocular image super-resolution reconstruction algorithm guided by multi-attention mechanism[J]. Electronic Measurement Technology, 2021, 44 (15): 103- 108
doi: 10.19651/j.cnki.emt.2106993
6 ZHOU E, FAN H, CAO Z, et al. Learning face hallucination in the wild [C]// Proceeding of the Association or the Advancement of Artificial Intelligence. San Francisco: AAAI, 2015: 3871-3877.
7 LIU H, HAN Z, GUO J, et al. A noise robust face hallucination framework via cascaded model of deep convolutional networks and manifold learning [C]// Proceeding of the IEEE International Conference on Multimedia and Expo. Santiago: ICME, 2018: 1-6.
8 LIU S, XIONG C Y, SHI X D, et al Progressive face super-resolution with cascaded recurrent convolutional network[J]. Neurocomputing, 2021, 449 (8): 357- 367
9 CHEN Y, TAI Y, LIU X, et al. FSRNet: end-to-end learning face super-resolution with facial priors [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: CVPR, 2018: 2492-2501.
10 ZHANG Y, WU Y, CHEN L. MSFSR: a multi-stage face super-resolution with accurate facial representation via enhanced facial boundaries [C]// Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: CVPR, 2020: 2120-2129.
11 YIN Y, ROBINSON J P, ZHANG Y, et al. Joint super-resolution and alignment of tiny faces [C]// Proceeding of the Association for the Advancement of Artificial Intelligence. Honolulu: AAAI, 2019: 12693–12700.
12 KIM J, LI G, YUN I, et al Edge and identity preserving network for face super-resolution[J]. Neurocomputing, 2021, 446 (7): 11- 22
13 刘朋伟, 高媛, 秦品乐, 等 基于多感受野的生成对抗网络医学MRI影像超分辨率重建[J]. 计算机应用, 2022, 42 (3): 938- 945
LIU Peng-wei, GAO Yuan, QIN Pin-le, et al Medical MRI image super-resolution reconstruction based on multi-receptive field generative adversarial network[J]. Journal of Computer Applications, 2022, 42 (3): 938- 945
14 NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimations [C]// Proceedings of the European Conference on Computer Vision. Amsterdam: ECCV, 2016: 483-499.
15 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2017-06-12]. https://arxiv.org/pdf/1706.03762.pdf.
16 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [EB/OL]. [2020-10-22]. https://arxiv.org/pdf/2010.11929.pdf.
17 LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: ICCV, 2021: 9992-10002.
18 HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: CVPR, 2018: 7132-7141.
19 WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: CVPR, 2020: 11531-11539.
20 SHAW P, USZKOREIT J, VASWANI A. Self-attention with relative position representations [EB/OL]. [2018-03-06]. https://arxiv.org/pdf/1803.02155.pdf.
21 RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer [EB/OL]. [2019-10-23]. https://arxiv.org/ pdf/1910.10683.pdf.
22 GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: CVPR, 2016: 2414-2423.
23 LIU Z, LUO P, WANG X, et al. Deep learning face attributes in the wild [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Santiago: ICCV, 2015: 3730-3738.
24 LE V, BRANDT J, LIN Z, et al. Interactive facial feature localization [C]// Proceedings of the European Conference on Computer Vision. Florence: ECCV, 2012: 679-692.
25 ZHANG K, ZHANG Z, LI Z, et al Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23 (10): 1499- 1503
doi: 10.1109/LSP.2016.2603342
[1] 王誉翔,钟智伟,夏鹏程,黄亦翔,刘成良. 基于改进Transformer的复合故障解耦诊断方法[J]. 浙江大学学报(工学版), 2023, 57(5): 855-864.
[2] 陆昱翔,徐冠华,唐波. 基于视觉Transformer时空自注意力的工人行为识别[J]. 浙江大学学报(工学版), 2023, 57(3): 446-454.
[3] 刘超,孔兵,杜国王,周丽华,陈红梅,包崇明. 高阶互信息最大化与伪标签指导的深度聚类[J]. 浙江大学学报(工学版), 2023, 57(2): 299-309.
[4] 王万良,王铁军,陈嘉诚,尤文波. 融合多尺度和多头注意力的医疗图像分割方法[J]. 浙江大学学报(工学版), 2022, 56(9): 1796-1805.
[5] 贺俊,张雅声,尹灿斌. 基于深度学习的星载SAR工作模式鉴别[J]. 浙江大学学报(工学版), 2022, 56(8): 1676-1684.
[6] 莫仁鹏,司小胜,李天梅,朱旭. 基于多尺度特征与注意力机制的轴承寿命预测[J]. 浙江大学学报(工学版), 2022, 56(7): 1447-1456.
[7] 温竹鹏,陈捷,刘连华,焦玲玲. 基于小波变换和优化CNN的风电齿轮箱故障诊断[J]. 浙江大学学报(工学版), 2022, 56(6): 1212-1219.
[8] 鞠晓臣,赵欣欣,钱胜胜. 基于自注意力机制的桥梁螺栓检测算法[J]. 浙江大学学报(工学版), 2022, 56(5): 901-908.
[9] 何立,庞善民. 结合年龄监督和人脸先验的语音-人脸图像重建[J]. 浙江大学学报(工学版), 2022, 56(5): 1006-1016.
[10] 王云灏,孙铭会,辛毅,张博宣. 基于压电薄膜传感器的机器人触觉识别系统[J]. 浙江大学学报(工学版), 2022, 56(4): 702-710.
[11] 温佩芝,陈君谋,肖雁南,温雅媛,黄文明. 基于生成式对抗网络和多级小波包卷积网络的水下图像增强算法[J]. 浙江大学学报(工学版), 2022, 56(2): 213-224.
[12] 刘英莉,吴瑞刚,么长慧,沈韬. 铝硅合金实体关系抽取数据集的构建方法[J]. 浙江大学学报(工学版), 2022, 56(2): 245-253.
[13] 袁天乐,袁巨龙,朱勇建,郑翰辰. 基于改进YOLOv5的推力球轴承表面缺陷检测算法[J]. 浙江大学学报(工学版), 2022, 56(12): 2349-2357.
[14] 于楠晶,范晓飚,邓天民,冒国韬. 基于多头自注意力的复杂背景船舶检测算法[J]. 浙江大学学报(工学版), 2022, 56(12): 2392-2402.
[15] 陈巧红,李妃玉,孙麒,贾宇波. 基于LSTM与衰减自注意力的答案选择模型[J]. 浙江大学学报(工学版), 2022, 56(12): 2436-2444.