Image super-resolution reconstruction method driven by two-dimensional cross-fusion

doi:10.3785/j.issn.1008-973X.2025.12.006

Journal of ZheJiang University (Engineering Science)

2025, Vol. 59

Issue (12): 2516-2526 DOI: 10.3785/j.issn.1008-973X.2025.12.006

Image super-resolution reconstruction method driven by two-dimensional cross-fusion

Xiaofen JIA1,2(

),Zixiang WANG3,Baiting ZHAO3,Zhenhuan LIANG2,Rui HU2

1. State Key Laboratory of Digital Intelligent Technology for Unmanned Coal Mining, Anhui University of Science and Technology, Huainan 232001, China
2. Institute of Artificial Intelligence, Anhui University of Science and Technology, Huainan 232001, China
3. Institute of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan 232001, China

Download:

HTML

PDF(2088KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

The existing image super-resolution models do not extract the underlying features in the deep semantic information of the image sufficiently, leading to the loss of details of the reconstructed image. Thus, an image super-resolution model driven by the cross-fusion of two dimensions of space and channel was proposed. The model used Transformer’s attention mechanism to build spatial intensive global attention (SIGA) in the spatial dimension to capture the location relationship of deep spatial regions. Channel cross attention (CCA) was built in the channel dimension to capture the feature dependence between channels. SIGA and CCA were respectively connected in parallel with deep separable convolutions to enhance the model’s ability to extract low-level features from high-level semantic information. Meanwhile, a cross fusion block (CFB) was developed by using a spatial compression strategy to ensure the efficient fusion of fine-grained features between the attention modules and deep separable convolutions. The cascaded two-dimensional cross-fusion modules facilitate the comprehensive intersection and aggregation of deep semantic information, thus realizing the restoration of delicate structures in the image. The experimental results showed that the proposed model achieved a PSNR improvement of 0.52 dB and 0.81 dB respectively compared with the latest method BiGLFE, in Urban100 and Manga109 with a scale factor of 4.

Key words： image super-resolution Transformer CNN fusion spatial attention channel attention

Received: 03 December 2024 Published: 25 November 2025

CLC:

TP 391

Fund: 国家自然科学基金资助项目（52174141）；安徽省自然科学基金资助项目（2108085ME158）；合肥综合性国家科学中心大健康研究院职业医学与健康联合研究中心科研资助项目（OMH-2023-10）；安徽理工大学引进人才科研启动基金（2022yjrc44）.

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Xiaofen JIA
	Zixiang WANG
	Baiting ZHAO
	Zhenhuan LIANG
	Rui HU

Cite this article:

Xiaofen JIA,Zixiang WANG,Baiting ZHAO,Zhenhuan LIANG,Rui HU. Image super-resolution reconstruction method driven by two-dimensional cross-fusion. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2516-2526.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.12.006 OR https://www.zjujournals.com/eng/Y2025/V59/I12/2516

双维度交叉融合驱动的图像超分辨率重建方法

针对现有图像超分辨率模型对图像深层语义信息中的底层特征提取不充分，导致重建图像细节丢失的问题，提出从空间、通道双维度交叉融合驱动的图像超分辨率模型. 该模型利用Transformer的注意力机制，在空间维度搭建空间密集全局注意力（SIGA），捕捉深层空间区域位置关系；在通道维度搭建通道交叉注意力（CCA），捕获通道间的特征依赖性. SIGA与CCA分别并联深度可分离卷积，增强模型高层语义信息中底层特征的提取能力，并使用空间压缩策略开发交叉融合模块（CFB），保证注意力模块与卷积之间的细粒特征高效融合. 级联双维度融合模块，助力深层语义信息全面交汇与聚合，实现恢复图像中的细腻结构. 实验表明，在比例因子为4的Urban100和Manga109中，相较于最新方法BiGLFE，该模型在PSNR上分别提高了0.52、0.81 dB.

关键词： 图像超分, Transformer, CNN, 融合, 空间注意力, 通道注意力

Fig.1 Overall network structure of CGT

Fig.2 Structure of spatial intensity global attention（SIGA）

Fig.3 Structure of channel cross attention（CCA）

Fig.4 Structure of cross fusion block（CFB）

Fig.5 Comparative test of different numbers of CGTB in Manga109 (×4)

Tab.1 Parameters for different quantities of CGTB

Tab.2 Performance tests for series and parallel connections with different quantities of SGB+CGB

Tab.3 Experimental results of five CGT degenerative networks

Fig.6 Visualization of features extracted from different modules

Fig.7 Comparison of parameter quantities on Urban100 (×4)

方法	年份	倍数	Set5		Set14		BSD100		Urban100		Manga109
方法	年份	倍数	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM	PSNR/dB	SSIM
IMDN^[17]	2019	×2	38.00	0.9605	33.63	0.9177	32.19	0.8996	32.17	0.9283	—	—
HAN^[19]	2019	×2	38.27	0.9614	34.16	0.9217	32.41	0.9027	33.35	0.9385	39.46	0.9785
SAN^[32]	2020	×2	38.31	0.9620	34.07	0.9213	32.42	0.9028	33.10	0.9370	39.32	0.9792
RFANET^[33]	2020	×2	38.26	0.9615	34.16	0.9220	32.41	0.9026	33.33	0.9389	39.44	0.9783
NLSN^[34]	2021	×2	38.34	0.9618	34.08	0.9231	32.43	0.9027	33.42	0.9394	39.59	0.9789
EMT^[35]	2024	×2	38.29	0.9615	34.23	0.9229	32.40	0.9027	33.28	0.9385	39.59	0.9789
BiGLFE^[36]	2024	×2	38.15	0.9601	33.80	0.9194	32.29	0.8994	32.71	0.9329	38.96	0.9771
CMSN^[37]	2024	×2	38.18	0.9612	33.84	0.9195	32.30	0.9014	32.65	0.9329	39.11	0.9780
CGT(本研究模型)	2024	×2	38.36	0.9618	34.11	0.9220	32.41	0.9026	33.48	0.9395	39.67	0.9792
IMDN^[17]	2019	×3	34.36	0.9270	30.32	0.8417	29.09	0.8046	28.17	0.8519	33.61	0.9445
HAN^[19]	2019	×3	34.75	0.9299	30.67	0.8483	29.32	0.8110	29.10	0.8705	34.48	0.9500
SAN^[32]	2020	×3	34.75	0.9300	30.59	0.8476	29.33	0.8112	28.93	0.8671	34.30	0.9494
RFANET^[33]	2020	×3	34.79	0.9300	30.67	0.8487	29.34	0.8115	29.15	0.8720	34.59	0.9506
NLSN^[34]	2021	×3	34.85	0.9306	30.70	0.8485	29.34	0.8117	29.25	0.8726	34.57	0.9508
EMT^[35]	2024	×3	34.80	0.9303	30.71	0.8489	29.33	0.8113	29.16	0.8716	34.65	0.9508
BiGLFE^[36]	2024	×3	34.59	0.9276	30.33	0.8449	29.24	0.8059	28.76	0.8642	34.03	0.9460
CMSN^[37]	2024	×3	34.62	0.9288	30.50	0.8452	29.22	0.8082	28.60	0.8612	34.12	0.9476
CGT(本研究模型)	2024	×3	34.91	0.9308	30.75	0.8496	29.36	0.8119	29.26	0.8729	34.77	0.9514
IMDN^[17]	2019	×4	32.21	0.8948	28.58	0.7811	27.56	0.7353	26.04	0.7838	—	—
HAN^[19]	2019	×4	32.59	0.9000	28.87	0.7891	27.78	0.7444	26.96	0.8109	31.27	0.9184
SAN^[32]	2020	×4	32.64	0.9003	28.92	0.7888	27.78	0.7436	26.79	0.8068	31.18	0.9169
RFANET^[33]	2020	×4	32.66	0.9004	28.88	0.7894	27.79	0.7442	26.92	0.8112	31.41	0.9187
NLSN^[34]	2021	×4	32.64	0.9002	28.90	0.7890	27.80	0.7442	26.85	0.8094	31.42	0.9177
EMT^[35]	2024	×4	32.64	0.9003	28.97	0.7901	27.81	0.7441	26.98	0.8118	31.48	0.9190
BiGLFE^[36]	2024	×4	32.52	0.8971	28.64	0.7858	27.74	0.7377	26.60	0.8016	31.00	0.9123
CMSN^[37]	2024	×4	32.41	0.8975	28.77	0.7851	27.68	0.7398	26.44	0.7964	31.00	0.9133
CGT(本研究模型)	2024	×4	32.81	0.9024	29.03	0.7921	27.85	0.7456	27.12	0.8155	31.81	0.9224

Tab.4 Comparison with advanced methods on five benchmark datasets

Fig.8 Visual comparison of reconstructed images on Set14 (×2)

Fig.9 Visual comparison of reconstructed images on BSD100 (×4)

Fig.10 Visual comparison of reconstructed images on Urban100 (×4)


[1]	DENG C, LUO X, WANG W Multiple frame splicing and degradation learning for hyperspectral imagery super-resolution[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 8389- 8401 doi: 10.1109/JSTARS.2022.3207777

[2]	MA J, LIU S, CHENG S, et al STSRNet: self-texture transfer super-resolution and refocusing network[J]. IEEE Transactions on Medical Imaging, 2022, 41 (2): 383- 393 doi: 10.1109/TMI.2021.3112923

[3]	ZHAO Z, ZHANG Y, LI C, et al Thermal UAV image super-resolution guided by multiple visible cues[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5000314

[4]	DONG C, LOY C C, HE K, et al Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38 (2): 295- 307 doi: 10.1109/TPAMI.2015.2439281

[5]	KIM J, LEE J K, LEE K M. Accurate image super-resolution using very deep convolutional networks [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 1646–1654.

[6]	ZHANG Y, TIAN Y, KONG Y, et al. Residual dense network for image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2472–2481.

[7]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132–7141.

[8]	LIM B, SON S, KIM H, et al. Enhanced deep residual networks for single image super-resolution [C]// IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 1132–1140.

[9]	VASWANI A, SHAZEER N M, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. California: Curran Associates Inc., 2017: 6000–6010.

[10]	LIANG J, CAO J, SUN G, et al. SwinIR: image restoration using swin transformer [C]// IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021: 1833–1844.

[11]	CHU X, TIAN Z, WANG Y, et al. Twins: revisiting the design of spatial attention in vision transformers [J]. Advances in Neural Information Processing Systems. 2021, 34: 9355–9366.

[12]	YANG R, MA H, WU J, et al. ScalableViT: rethinking the context-oriented generalization of vision transformer [C]// Proceedings of Computer Vision – ECCV 2022. Cham: Springer, 2022: 480–496.

[13]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers [C]// Proceedings of Computer Vision – ECCV 2020. Cham: Springer, 2020: 213–229.

[14]	BEAL J, KIM E, TZENG E, et al. Toward Transformer-Based Object Detection [EB/OL]. (2020−12−17) [2025−09−15]. https://doi.org/10.48550/arXiv.2012.09958.

[15]	PENG Z, HUANG W, GU S, et al. Conformer: local features coupling global representations for visual recognition [C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021: 357−366.

[16]	WANG H, CHEN X, NI B, et al. Omni aggregation networks for lightweight image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 22378–22387.

[17]	HUI Z, GAO X, YANG Y, et al. Lightweight image super-resolution with information multi-distillation network [C]// 27th ACM International Conference on Multimedia. [S.l.]: ACM, 2019: 2024−2032.

[18]	JI J, ZHONG B, WU Q, et al A channel-wise multi-scale network for single image super-resolution[J]. IEEE Signal Processing Letters, 2024, 31: 805- 809 doi: 10.1109/LSP.2024.3372781

[19]	WANG X, JI H, SHI C, et al. Heterogeneous Graph Attention Network [C]// The World Wide Web Conference. San Francisco: [s.n.], 2019: 2022–2032.

[20]	LI X, DONG J, TANG J, et al. DLGSANet: lightweight dynamic local and global self-attention network for image super-resolution [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 12746–12755.

[21]	LIN H, CHENG X, WU X, et al. CAT: cross attention in vision transformer [C]// IEEE International Conference on Multimedia and Expo. Taipei: IEEE, 2022: 1–6.

[22]	CHEN Z, ZHANG Y, GU J, et al. Recursive Generalization Transformer for Image Super-Resolution [EB/OL]. (2023−03−11) [2025−09−15]. https://doi.org/10.48550/arXiv.2303.06373.

[23]	KINGMA D, BA J. Adam: a method for stochastic optimization [EB/OL]. (2014−12−14) [2025−09−15]. https://doi.org/10.48550/arXiv.1412.6980.

[24]	AGUSTSSON E, TIMOFTE R. NTIRE 2017 challenge on single image super-resolution: dataset and study [C]// IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 1122–1131.

[25]	BEVILACQUA M, ROUMY A, GUILLEMOT C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding [C]// Proceedings of British Machine Vision Conference. Surrey: British Machine Vision Association, 2012.

[26]	ZEYDE R, ELAD M, PROTTER M. On single image scale-up using sparse-representations [C]// Proceedings of International Conference on Curves and Surfaces. Avignon: Springer, 2010: 711–730.

[27]	MARTIN D, FOWLKES C, TAL D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics [C]// 8th IEEE International Conference on Computer Vision. Vancouver: IEEE, 2001: 416–423.

[28]	HUANG J B, SINGH A, AHUJA N. Single image super-resolution from transformed self-exemplars [C]// IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 5197–5206.

[29]	MATSUI Y, ITO K, ARAMAKI Y, et al Sketch-based manga retrieval using manga109 dataset[J]. Multimedia Tools and Applications, 2017, 76 (20): 21811- 21838 doi: 10.1007/s11042-016-4020-z

[30]	WANG Z, BOVIK A C, SHEIKH H R, et al Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13 (4): 600- 612 doi: 10.1109/TIP.2003.819861

[31]	ZHANG Y, TIAN Y, KONG Y, et al. Residual dense network for image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2472–2481.

[32]	LYN J, YAN S. Non-local Second-order attention network for single image super resolution [C]// Proceedings of International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Dublin: Springer, 2020: 267–279.

[33]	LIU J, ZHANG W, TANG Y, et al. Residual feature aggregation network for image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 2356–2365.

[34]	MEI Y, FAN Y, ZHOU Y. Image super-resolution with non-local sparse attention [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3516–3525.

[35]	ZHENG L, ZHU J, SHI J, et al Efficient mixed transformer for single image super-resolution[J]. Engineering Applications of Artificial Intelligence, 2024, 133: 108035 doi: 10.1016/j.engappai.2024.108035

[36]	HWANG K, YOON G, SONG J, et al Fusing bi-directional global–local features for single image super-resolution[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107336 doi: 10.1016/j.engappai.2023.107336

[1]	Wenxin CHENG,Guanghui YAN,Wenwen CHANG,Baijing WU,Yaning HUANG. Channel-weighted multimodal feature fusion for EEG-based fatigue driving detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1775-1783.

[2]	Chaoqun DONG,Zhan WANG,Ping LIAO,Shuai XIE,Yujie RONG,Jingsong ZHOU. Lightweight YOLOv5s-OCG rail sleeper crack detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1838-1845.

[3]	Xuan MENG,Xueying ZHANG,Ying SUN,Yaru ZHOU. EEG emotion recognition based on electrode arrangement and Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1872-1880.

[4]	Zhuguo ZHOU,Yujun LU,Liye LV. Improved YOLOv5s-based algorithm for printed circuit board defect detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1608-1616.

[5]	Jiarui FU,Zhaofei LI,Hao ZHOU,Wei HUANG. Camouflaged object detection based on Convnextv2 and texture-edge guidance[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1718-1726.

[6]	Jie LIU,You WU,Jiahe TIAN,Ke HAN. Based on improved Transformer for super-resolution reconstruction of lung CT images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1434-1442.

[7]	Jingyao HE,Pengfei LI,Chengzhi WANG,Zhenming LV,Ping MU. Dynamic 3D reconstruction method using binocular vision and improved YOLOv8[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1443-1450.

[8]	Shengju WANG,Zan ZHANG. Missing value imputation algorithm based on accelerated diffusion model[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1471-1480.

[9]	Dongping ZHANG,Dawei WANG,Shuji HE,Siliang TANG,Zhiyong LIU,Zhongqiu LIU. Remaining useful life prediction of aircraft engines based on cross-dimensional feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1504-1513.

[10]	Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN. Visual induced motion sickness estimation model based on attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1110-1118.

[11]	Lihong WANG,Xinqian LIU,Jing LI,Zhiquan FENG. Network intrusion detection method based on federated learning and spatiotemporal feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1201-1210.

[12]	Mengyao ZHANG,Jie ZHOU,Wenting LI,Yong ZHAO. Three-dimensional mesh segmentation framework using global and local information[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 912-919.

[13]	Dejun ZHANG,Yanzi BAI,Feng CAO,Yiqi WU,Zhanya XU. Point cloud Transformer adapter for dense prediction task[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 920-928.

[14]	Li MA,Yongshun WANG,Yao HU,Lei FAN. Pre-trained long-short spatiotemporal interleaved Transformer for traffic flow prediction applications[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 669-678.

[15]	Shenchong LI,Xinhua ZENG,Chuanqu LIN. Multi-task environment perception algorithm for autonomous driving based on axial attention[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 769-777.

Viewed

Full text

Abstract

Cited

Shared

Discussed