Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (5): 865-874    DOI: 10.3785/j.issn.1008-973X.2023.05.002
    
Structured image super-resolution network based on improved Transformer
Xin-dong LV(),Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG*()
College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
Download: HTML     PDF(1744KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Most of existing structural image super-resolution reconstruction algorithms can only solve a specific single type of structural image super-resolution problem. A structural image super-resolution network based on improved Transformer (TransSRNet) was proposed. The network used the self-attention mechanism of Transformer mine a wide range of global information in spatial sequences. A spatial attention unit was built by using the hourglass block structure. The mapping relationship between the low-resolution space and the high-resolution space in the local area was concerned. The structured information in the image mapping process was extracted. The channel attention module was used to fuse the features of the self-attention module and the spatial attention module. The TransSRNet was evaluated on highly-structured CelebA, Helen, TCGA-ESCA and TCGA-COAD datasets. Results of evaluation showed that the TransSRNet model had a better overall performance compared with the super-resolution algorithms. With a upscale factor of 8, the PSNR of the face dataset and the medical image dataset could reach 28.726 and 26.392 dB respectively, and the SSIM could reach 0.844 and 0.881 respectively.



Key wordsconvolutional neural network      Transformer      self-attention      spatial attention      image super-resolution reconstruction     
Received: 25 July 2022      Published: 09 May 2023
CLC:  TP 391  
Fund:  山西省中央引导地方科技发展资金资助项目(YDZJSX2021C005,YDZJSX2022A016);2022年浙江大学CAD&CG国家重点实验室开放课题项目(A2221)
Corresponding Authors: Hong-xia DENG     E-mail: 865877436@qq.com;denghongxia@tyut.edu.cn
Cite this article:

Xin-dong LV,Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG. Structured image super-resolution network based on improved Transformer. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 865-874.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.05.002     OR     https://www.zjujournals.com/eng/Y2023/V57/I5/865


基于改进Transformer的结构化图像超分辨网络

针对现有的结构化图像超分辨重建算法大多只能解决特定单一种类的结构化图像超分辨问题,提出一种基于改进Transformer的结构化图像超分辨率网络(TransSRNet). 该网络利用Transformer的自注意力机制在空间序列中挖掘大范围的全局信息. 采用沙漏块结构搭建空间注意力单元,关注低分辨率空间和高分辨率空间在局部区域的映射关系,提取图像映射过程中的结构化信息,使用高效通道注意力模块对自注意力模块和空间注意力模块做特征融合. 在高度结构化CelebA、Helen、TCGA-ESCA 和TCGA-COAD数据集上的模型评估结果表明,相较于主流超分辨算法,TransSRNet整体性能表现更好. 在放大因子为8时,人脸数据集和医学峰值信噪比(PRNR)可以分别达到28.726、26.392 dB, 结构相似性(SSIM)可以分别达到0.844、0.881.


关键词: 卷积神经网络,  Transformer,  自注意力,  空间注意力,  图像超分辨率重建 
Fig.1 Hourglass block diagram
Fig.2 TransSRNet structure diagram
Fig.3 Spatial attention unit structure diagram
Fig.4 Encoder and Decoder structure diagram
Fig.5 Residual Transformer block structure diagram
Fig.6 ECA module structure diagram
Fig.7 Effects of different numbers of spatial attention units on PSNR and SSIM
NT PSNR/dB SSIM
2 28.479 0.838
4 28.535 0.839
6 28.568 0.839
8 28.343 0.834
Tab.1 Effects of different numbers of residual Transformer blocks on PSNR and SSIM
实验 PSNR/dB SSIM
1 28.341 0.834
2 26.089 0.763
3 28.568 0.839
Tab.2 Effects of retaining different attention modules on PSNR and SSIM
损失函数 PSNR/dB SSIM
lpix 28.568 0.839
lpixlstyle联合 28.598 0.839
lpixlstylelssim联合 28.632 0.841
Tab.3 Effect of joint different loss functions on PSNR and SSIM
Fig.8 Effects of SE module and ECA module on PSNR and SSIM
放大因子 Bicubic SRGAN FSRNet SPSR EIPNet TransSRNet(Our)
PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM
2 34.942, 0.955 35.831, 0.962 37.699, 0.971 37.729, 0.966 37.899, 0.972 38.930, 0.975
3 31.130, 0.901 32.400, 0.920 33.526, 0.935 33.268, 0.921 33.942, 0.940 35.227, 0.949
4 28.999, 0.850 30.034, 0.871 32.338, 0.916 30.580, 0.872 32.569, 0.919 33.215, 0.925
8 24.531, 0.698 25.278, 0.717 26.934, 0.795 25.579, 0.722 26.898, 0.791 28.726, 0.844
Tab.4 Comparison results of different methods on Helen dataset
Fig.9 Comparison of subjective effects with upscalefactors of 2, 3, 4 and 8 on Helen dataset
放大因子 Bicubic SRGAN RNAN SPSR NLSN TransSRNet(Our)
PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM PSNR/dB, SSIM
2 30.111, 0.937 31.838, 0.942 36.223, 0.976 34.917, 0.969 36.514, 0.977 36.378, 0.976
3 27.310, 0.889 28.635, 0.909 31.764, 0.951 30.648, 0.941 31.840, 0.952 32.829, 0.956
4 25.751, 0.852 26.357, 0.875 29.467, 0.928 28.092, 0.914 29.552, 0.929 30.449, 0.936
8 22.872, 0.774 23.005, 0.802 24.546, 0.838 23.677, 0.828 24.597, 0.840 26.392, 0.881
Tab.5 Comparison results of different methods on medical CT dataset
Fig.10 Comparison of subjective effects with upscale factor of 2 on medical CT dataset
Fig.11 Comparison of subjective effects with upscale factor of 3 on medical CT dataset
Fig.12 Comparison of subjective effects with upscale factor 4 on medical CT dataset
Fig.13 Comparison of subjective effects with upscale factor 8 on medical CT dataset
[1]   DONG C, LOY C C, HE K, et al. Learning a deep convolutional network for image super-resolution [C]// Proceedings of the European Conference on Computer Vision. Columbus: CVPR, 2014: 184-199.
[2]   LEDIG C, THEIS L, HUSZAR F, et al. Photo-realistic single image super-resolution using a generative adversarial network [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: CVPR, 2017: 105-114.
[3]   MA C, RAO Y, CHENG Y, et al. Structure-preserving super-resolution with gradient guidance [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: CVPR, 2020: 7766-7775.
[4]   ZHANG Y, LI K, LI K, et al. Residual non-local attention networks for image restoration [EB/OL]. [2019-03-24]. https://arxiv.org/pdf/1903.10082.pdf.
[5]   徐永兵, 袁东, 余大兵, 等 多注意力机制引导的双目图像超分辨率重建算法[J]. 电子测量技术, 2021, 44 (15): 103- 108
XU Yong-bing, YUAN Dong, YU Da-bing, et al Binocular image super-resolution reconstruction algorithm guided by multi-attention mechanism[J]. Electronic Measurement Technology, 2021, 44 (15): 103- 108
doi: 10.19651/j.cnki.emt.2106993
[6]   ZHOU E, FAN H, CAO Z, et al. Learning face hallucination in the wild [C]// Proceeding of the Association or the Advancement of Artificial Intelligence. San Francisco: AAAI, 2015: 3871-3877.
[7]   LIU H, HAN Z, GUO J, et al. A noise robust face hallucination framework via cascaded model of deep convolutional networks and manifold learning [C]// Proceeding of the IEEE International Conference on Multimedia and Expo. Santiago: ICME, 2018: 1-6.
[8]   LIU S, XIONG C Y, SHI X D, et al Progressive face super-resolution with cascaded recurrent convolutional network[J]. Neurocomputing, 2021, 449 (8): 357- 367
[9]   CHEN Y, TAI Y, LIU X, et al. FSRNet: end-to-end learning face super-resolution with facial priors [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: CVPR, 2018: 2492-2501.
[10]   ZHANG Y, WU Y, CHEN L. MSFSR: a multi-stage face super-resolution with accurate facial representation via enhanced facial boundaries [C]// Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: CVPR, 2020: 2120-2129.
[11]   YIN Y, ROBINSON J P, ZHANG Y, et al. Joint super-resolution and alignment of tiny faces [C]// Proceeding of the Association for the Advancement of Artificial Intelligence. Honolulu: AAAI, 2019: 12693–12700.
[12]   KIM J, LI G, YUN I, et al Edge and identity preserving network for face super-resolution[J]. Neurocomputing, 2021, 446 (7): 11- 22
[13]   刘朋伟, 高媛, 秦品乐, 等 基于多感受野的生成对抗网络医学MRI影像超分辨率重建[J]. 计算机应用, 2022, 42 (3): 938- 945
LIU Peng-wei, GAO Yuan, QIN Pin-le, et al Medical MRI image super-resolution reconstruction based on multi-receptive field generative adversarial network[J]. Journal of Computer Applications, 2022, 42 (3): 938- 945
[14]   NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimations [C]// Proceedings of the European Conference on Computer Vision. Amsterdam: ECCV, 2016: 483-499.
[15]   VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2017-06-12]. https://arxiv.org/pdf/1706.03762.pdf.
[16]   DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [EB/OL]. [2020-10-22]. https://arxiv.org/pdf/2010.11929.pdf.
[17]   LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: ICCV, 2021: 9992-10002.
[18]   HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: CVPR, 2018: 7132-7141.
[19]   WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: CVPR, 2020: 11531-11539.
[20]   SHAW P, USZKOREIT J, VASWANI A. Self-attention with relative position representations [EB/OL]. [2018-03-06]. https://arxiv.org/pdf/1803.02155.pdf.
[21]   RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer [EB/OL]. [2019-10-23]. https://arxiv.org/ pdf/1910.10683.pdf.
[22]   GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: CVPR, 2016: 2414-2423.
[23]   LIU Z, LUO P, WANG X, et al. Deep learning face attributes in the wild [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Santiago: ICCV, 2015: 3730-3738.
[24]   LE V, BRANDT J, LIN Z, et al. Interactive facial feature localization [C]// Proceedings of the European Conference on Computer Vision. Florence: ECCV, 2012: 679-692.
[25]   ZHANG K, ZHANG Z, LI Z, et al Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23 (10): 1499- 1503
doi: 10.1109/LSP.2016.2603342
[1] Yu-xiang WANG,Zhi-wei ZHONG,Peng-cheng XIA,Yi-xiang HUANG,Cheng-liang LIU. Compound fault decoupling diagnosis method based on improved Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 855-864.
[2] Yu-xiang LU,Guan-hua XU,Bo TANG. Worker behavior recognition based on temporal and spatial self-attention of vision Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 446-454.
[3] Chao LIU,Bing KONG,Guo-wang DU,Li-hua ZHOU,Hong-mei CHEN,Chong-ming BAO. Deep clustering via high-order mutual information maximization and pseudo-label guidance[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 299-309.
[4] Wan-liang WANG,Tie-jun WANG,Jia-cheng CHEN,Wen-bo YOU. Medical image segmentation method combining multi-scale and multi-head attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1796-1805.
[5] Jun HE,Ya-sheng ZHANG,Can-bin YIN. Operating modes identification of spaceborne SAR based on deep learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1676-1684.
[6] Ren-peng MO,Xiao-sheng SI,Tian-mei LI,Xu ZHU. Bearing life prediction based on multi-scale features and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1447-1456.
[7] Zhu-peng WEN,Jie CHEN,Lian-hua LIU,Ling-ling JIAO. Fault diagnosis of wind power gearbox based on wavelet transform and improved CNN[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1212-1219.
[8] Xiao-chen JU,Xin-xin ZHAO,Sheng-sheng QIAN. Self-attention mechanism based bridge bolt detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 901-908.
[9] Li HE,Shan-min PANG. Face reconstruction from voice based on age-supervised learning and face prior information[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 1006-1016.
[10] Yun-hao WANG,Ming-hui SUN,Yi XIN,Bo-xuan ZHANG. Robot tactile recognition system based on piezoelectric film sensor[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 702-710.
[11] Guo-peng ZHANG,Zi-han LI,Hao WANG,zheng ZHENG. Isolated AC-DC solid state transformer front and rear stages integrated sliding mode control[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 622-630.
[12] Pei-zhi WEN,Jun-mou CHEN,Yan-nan XIAO,Ya-yuan WEN,Wen-ming HUANG. Underwater image enhancement algorithm based on GAN and multi-level wavelet CNN[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 213-224.
[13] Ying-li LIU,Rui-gang WU,Chang-hui YAO,Tao SHEN. Construction method of extraction dataset of Al-Si alloy entity relationship[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 245-253.
[14] Tian-le YUAN,Ju-long YUAN,Yong-jian ZHU,Han-chen ZHENG. Surface defect detection algorithm of thrust ball bearing based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2349-2357.
[15] Nan-jing YU,Xiao-biao FAN,Tian-min DENG,Guo-tao MAO. Ship detection algorithm in complex backgrounds via multi-head self-attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2392-2402.