Structured image super-resolution network based on improved Transformer

doi:10.3785/j.issn.1008-973X.2023.05.002

Journal of ZheJiang University (Engineering Science)

2023, Vol. 57

Issue (5): 865-874 DOI: 10.3785/j.issn.1008-973X.2023.05.002

Structured image super-resolution network based on improved Transformer

Xin-dong LV(

),Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG*(

)

College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China

Download:

HTML

PDF(1744KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

Most of existing structural image super-resolution reconstruction algorithms can only solve a specific single type of structural image super-resolution problem. A structural image super-resolution network based on improved Transformer (TransSRNet) was proposed. The network used the self-attention mechanism of Transformer mine a wide range of global information in spatial sequences. A spatial attention unit was built by using the hourglass block structure. The mapping relationship between the low-resolution space and the high-resolution space in the local area was concerned. The structured information in the image mapping process was extracted. The channel attention module was used to fuse the features of the self-attention module and the spatial attention module. The TransSRNet was evaluated on highly-structured CelebA, Helen, TCGA-ESCA and TCGA-COAD datasets. Results of evaluation showed that the TransSRNet model had a better overall performance compared with the super-resolution algorithms. With a upscale factor of 8, the PSNR of the face dataset and the medical image dataset could reach 28.726 and 26.392 dB respectively, and the SSIM could reach 0.844 and 0.881 respectively.

Key words： convolutional neural network Transformer self-attention spatial attention image super-resolution reconstruction

Received: 25 July 2022 Published: 09 May 2023

CLC:

TP 391

Fund: 山西省中央引导地方科技发展资金资助项目(YDZJSX2021C005，YDZJSX2022A016)；2022年浙江大学CAD&CG国家重点实验室开放课题项目(A2221)

Corresponding Authors: Hong-xia DENG E-mail: 865877436@qq.com;denghongxia@tyut.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Xin-dong LV
	Jiao LI
	Zhen-nan DENG
	Hao FENG
	Xin-tong CUI
	Hong-xia DENG

Cite this article:

Xin-dong LV,Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG. Structured image super-resolution network based on improved Transformer. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 865-874.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.05.002 OR https://www.zjujournals.com/eng/Y2023/V57/I5/865

基于改进Transformer的结构化图像超分辨网络

针对现有的结构化图像超分辨重建算法大多只能解决特定单一种类的结构化图像超分辨问题，提出一种基于改进Transformer的结构化图像超分辨率网络（TransSRNet）. 该网络利用Transformer的自注意力机制在空间序列中挖掘大范围的全局信息. 采用沙漏块结构搭建空间注意力单元，关注低分辨率空间和高分辨率空间在局部区域的映射关系，提取图像映射过程中的结构化信息，使用高效通道注意力模块对自注意力模块和空间注意力模块做特征融合. 在高度结构化CelebA、Helen、TCGA-ESCA 和TCGA-COAD数据集上的模型评估结果表明，相较于主流超分辨算法，TransSRNet整体性能表现更好. 在放大因子为8时，人脸数据集和医学峰值信噪比（PRNR）可以分别达到28.726、26.392 dB, 结构相似性（SSIM）可以分别达到0.844、0.881.

关键词： 卷积神经网络, Transformer, 自注意力, 空间注意力, 图像超分辨率重建

Fig.1 Hourglass block diagram

Fig.2 TransSRNet structure diagram

Fig.3 Spatial attention unit structure diagram

Fig.4 Encoder and Decoder structure diagram

Fig.5 Residual Transformer block structure diagram

Fig.6 ECA module structure diagram

Fig.7 Effects of different numbers of spatial attention units on PSNR and SSIM

Tab.1 Effects of different numbers of residual Transformer blocks on PSNR and SSIM

Tab.2 Effects of retaining different attention modules on PSNR and SSIM

Tab.3 Effect of joint different loss functions on PSNR and SSIM

Fig.8 Effects of SE module and ECA module on PSNR and SSIM

Tab.4 Comparison results of different methods on Helen dataset

Fig.9 Comparison of subjective effects with upscalefactors of 2, 3, 4 and 8 on Helen dataset

Tab.5 Comparison results of different methods on medical CT dataset

Fig.10 Comparison of subjective effects with upscale factor of 2 on medical CT dataset

Fig.11 Comparison of subjective effects with upscale factor of 3 on medical CT dataset

Fig.12 Comparison of subjective effects with upscale factor 4 on medical CT dataset

Fig.13 Comparison of subjective effects with upscale factor 8 on medical CT dataset


[1]	DONG C, LOY C C, HE K, et al. Learning a deep convolutional network for image super-resolution [C]// Proceedings of the European Conference on Computer Vision. Columbus: CVPR, 2014: 184-199.

[2]	LEDIG C, THEIS L, HUSZAR F, et al. Photo-realistic single image super-resolution using a generative adversarial network [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: CVPR, 2017: 105-114.

[3]	MA C, RAO Y, CHENG Y, et al. Structure-preserving super-resolution with gradient guidance [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: CVPR, 2020: 7766-7775.

[4]	ZHANG Y, LI K, LI K, et al. Residual non-local attention networks for image restoration [EB/OL]. [2019-03-24]. https://arxiv.org/pdf/1903.10082.pdf.

[5]	徐永兵, 袁东, 余大兵, 等多注意力机制引导的双目图像超分辨率重建算法[J]. 电子测量技术, 2021, 44 (15): 103- 108 XU Yong-bing, YUAN Dong, YU Da-bing, et al Binocular image super-resolution reconstruction algorithm guided by multi-attention mechanism[J]. Electronic Measurement Technology, 2021, 44 (15): 103- 108 doi: 10.19651/j.cnki.emt.2106993

[6]	ZHOU E, FAN H, CAO Z, et al. Learning face hallucination in the wild [C]// Proceeding of the Association or the Advancement of Artificial Intelligence. San Francisco: AAAI, 2015: 3871-3877.

[7]	LIU H, HAN Z, GUO J, et al. A noise robust face hallucination framework via cascaded model of deep convolutional networks and manifold learning [C]// Proceeding of the IEEE International Conference on Multimedia and Expo. Santiago: ICME, 2018: 1-6.

[8]	LIU S, XIONG C Y, SHI X D, et al Progressive face super-resolution with cascaded recurrent convolutional network[J]. Neurocomputing, 2021, 449 (8): 357- 367

[9]	CHEN Y, TAI Y, LIU X, et al. FSRNet: end-to-end learning face super-resolution with facial priors [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: CVPR, 2018: 2492-2501.

[10]	ZHANG Y, WU Y, CHEN L. MSFSR: a multi-stage face super-resolution with accurate facial representation via enhanced facial boundaries [C]// Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: CVPR, 2020: 2120-2129.

[11]	YIN Y, ROBINSON J P, ZHANG Y, et al. Joint super-resolution and alignment of tiny faces [C]// Proceeding of the Association for the Advancement of Artificial Intelligence. Honolulu: AAAI, 2019: 12693–12700.

[12]	KIM J, LI G, YUN I, et al Edge and identity preserving network for face super-resolution[J]. Neurocomputing, 2021, 446 (7): 11- 22

[13]	刘朋伟, 高媛, 秦品乐, 等基于多感受野的生成对抗网络医学MRI影像超分辨率重建[J]. 计算机应用, 2022, 42 (3): 938- 945 LIU Peng-wei, GAO Yuan, QIN Pin-le, et al Medical MRI image super-resolution reconstruction based on multi-receptive field generative adversarial network[J]. Journal of Computer Applications, 2022, 42 (3): 938- 945

[14]	NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimations [C]// Proceedings of the European Conference on Computer Vision. Amsterdam: ECCV, 2016: 483-499.

[15]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2017-06-12]. https://arxiv.org/pdf/1706.03762.pdf.

[16]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [EB/OL]. [2020-10-22]. https://arxiv.org/pdf/2010.11929.pdf.

[17]	LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: ICCV, 2021: 9992-10002.

[18]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: CVPR, 2018: 7132-7141.

[19]	WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: CVPR, 2020: 11531-11539.

[20]	SHAW P, USZKOREIT J, VASWANI A. Self-attention with relative position representations [EB/OL]. [2018-03-06]. https://arxiv.org/pdf/1803.02155.pdf.

[21]	RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer [EB/OL]. [2019-10-23]. https://arxiv.org/ pdf/1910.10683.pdf.

[22]	GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas: CVPR, 2016: 2414-2423.

[23]	LIU Z, LUO P, WANG X, et al. Deep learning face attributes in the wild [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Santiago: ICCV, 2015: 3730-3738.

[24]	LE V, BRANDT J, LIN Z, et al. Interactive facial feature localization [C]// Proceedings of the European Conference on Computer Vision. Florence: ECCV, 2012: 679-692.

[25]	ZHANG K, ZHANG Z, LI Z, et al Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23 (10): 1499- 1503 doi: 10.1109/LSP.2016.2603342

[1]	Yu-xiang WANG,Zhi-wei ZHONG,Peng-cheng XIA,Yi-xiang HUANG,Cheng-liang LIU. Compound fault decoupling diagnosis method based on improved Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 855-864.

[2]	Yu-xiang LU,Guan-hua XU,Bo TANG. Worker behavior recognition based on temporal and spatial self-attention of vision Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 446-454.

[3]	Chao LIU,Bing KONG,Guo-wang DU,Li-hua ZHOU,Hong-mei CHEN,Chong-ming BAO. Deep clustering via high-order mutual information maximization and pseudo-label guidance[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 299-309.

[4]	Wan-liang WANG,Tie-jun WANG,Jia-cheng CHEN,Wen-bo YOU. Medical image segmentation method combining multi-scale and multi-head attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1796-1805.

[5]	Jun HE,Ya-sheng ZHANG,Can-bin YIN. Operating modes identification of spaceborne SAR based on deep learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1676-1684.

[6]	Ren-peng MO,Xiao-sheng SI,Tian-mei LI,Xu ZHU. Bearing life prediction based on multi-scale features and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1447-1456.

[7]	Zhu-peng WEN,Jie CHEN,Lian-hua LIU,Ling-ling JIAO. Fault diagnosis of wind power gearbox based on wavelet transform and improved CNN[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1212-1219.

[8]	Xiao-chen JU,Xin-xin ZHAO,Sheng-sheng QIAN. Self-attention mechanism based bridge bolt detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 901-908.

[9]	Li HE,Shan-min PANG. Face reconstruction from voice based on age-supervised learning and face prior information[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 1006-1016.

[10]	Yun-hao WANG,Ming-hui SUN,Yi XIN,Bo-xuan ZHANG. Robot tactile recognition system based on piezoelectric film sensor[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 702-710.

[11]	Guo-peng ZHANG,Zi-han LI,Hao WANG,zheng ZHENG. Isolated AC-DC solid state transformer front and rear stages integrated sliding mode control[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 622-630.

[12]	Pei-zhi WEN,Jun-mou CHEN,Yan-nan XIAO,Ya-yuan WEN,Wen-ming HUANG. Underwater image enhancement algorithm based on GAN and multi-level wavelet CNN[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 213-224.

[13]	Ying-li LIU,Rui-gang WU,Chang-hui YAO,Tao SHEN. Construction method of extraction dataset of Al-Si alloy entity relationship[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 245-253.

[14]	Tian-le YUAN,Ju-long YUAN,Yong-jian ZHU,Han-chen ZHENG. Surface defect detection algorithm of thrust ball bearing based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2349-2357.

[15]	Nan-jing YU,Xiao-biao FAN,Tian-min DENG,Guo-tao MAO. Ship detection algorithm in complex backgrounds via multi-head self-attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2392-2402.

Viewed

Full text

Abstract

Cited

Shared

Discussed