Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (12): 2516-2526    DOI: 10.3785/j.issn.1008-973X.2025.12.006
    
Image super-resolution reconstruction method driven by two-dimensional cross-fusion
Xiaofen JIA1,2(),Zixiang WANG3,Baiting ZHAO3,Zhenhuan LIANG2,Rui HU2
1. State Key Laboratory of Digital Intelligent Technology for Unmanned Coal Mining, Anhui University of Science and Technology, Huainan 232001, China
2. Institute of Artificial Intelligence, Anhui University of Science and Technology, Huainan 232001, China
3. Institute of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan 232001, China
Download: HTML     PDF(2088KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

The existing image super-resolution models do not extract the underlying features in the deep semantic information of the image sufficiently, leading to the loss of details of the reconstructed image. Thus, an image super-resolution model driven by the cross-fusion of two dimensions of space and channel was proposed. The model used Transformer’s attention mechanism to build spatial intensive global attention (SIGA) in the spatial dimension to capture the location relationship of deep spatial regions. Channel cross attention (CCA) was built in the channel dimension to capture the feature dependence between channels. SIGA and CCA were respectively connected in parallel with deep separable convolutions to enhance the model’s ability to extract low-level features from high-level semantic information. Meanwhile, a cross fusion block (CFB) was developed by using a spatial compression strategy to ensure the efficient fusion of fine-grained features between the attention modules and deep separable convolutions. The cascaded two-dimensional cross-fusion modules facilitate the comprehensive intersection and aggregation of deep semantic information, thus realizing the restoration of delicate structures in the image. The experimental results showed that the proposed model achieved a PSNR improvement of 0.52 dB and 0.81 dB respectively compared with the latest method BiGLFE, in Urban100 and Manga109 with a scale factor of 4.



Key wordsimage super-resolution      Transformer      CNN      fusion      spatial attention      channel attention     
Received: 03 December 2024      Published: 25 November 2025
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(52174141);安徽省自然科学基金资助项目(2108085ME158);合肥综合性国家科学中心大健康研究院职业医学与健康联合研究中心科研资助项目(OMH-2023-10);安徽理工大学引进人才科研启动基金(2022yjrc44).
Cite this article:

Xiaofen JIA,Zixiang WANG,Baiting ZHAO,Zhenhuan LIANG,Rui HU. Image super-resolution reconstruction method driven by two-dimensional cross-fusion. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2516-2526.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.12.006     OR     https://www.zjujournals.com/eng/Y2025/V59/I12/2516


双维度交叉融合驱动的图像超分辨率重建方法

针对现有图像超分辨率模型对图像深层语义信息中的底层特征提取不充分,导致重建图像细节丢失的问题,提出从空间、通道双维度交叉融合驱动的图像超分辨率模型. 该模型利用Transformer的注意力机制,在空间维度搭建空间密集全局注意力(SIGA),捕捉深层空间区域位置关系;在通道维度搭建通道交叉注意力(CCA),捕获通道间的特征依赖性. SIGA与CCA分别并联深度可分离卷积,增强模型高层语义信息中底层特征的提取能力,并使用空间压缩策略开发交叉融合模块(CFB),保证注意力模块与卷积之间的细粒特征高效融合. 级联双维度融合模块,助力深层语义信息全面交汇与聚合,实现恢复图像中的细腻结构. 实验表明,在比例因子为4的Urban100和Manga109中,相较于最新方法BiGLFE,该模型在PSNR上分别提高了0.52、0.81 dB.


关键词: 图像超分,  Transformer,  CNN,  融合,  空间注意力,  通道注意力 
Fig.1 Overall network structure of CGT
Fig.2 Structure of spatial intensity global attention(SIGA)
Fig.3 Structure of channel cross attention(CCA)
Fig.4 Structure of cross fusion block(CFB)
Fig.5 Comparative test of different numbers of CGTB in Manga109 (×4)
pPM/106FL/109
24.20320.132
47.70835.485
611.21250.838
814.71766.19
1018.22181.543
Tab.1 Parameters for different quantities of CGTB
串/并联NSCPM/106FL/109Set5Set14BSD100Urban100
PSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIM
SGB+CGB串联28.2935.06032.660.900228.930.789627.780.743926.870.8094
311.2150.83832.810.902429.030.792127.850.745627.120.8155
418.9177.74532.700.901228.960.790627.820.744927.010.8128
SGB+CGB并联211.7148.32132.620.900128.950.789327.780.743626.880.8092
316.3566.50932.660.900728.970.789927.810.744126.940.8110
420.9884.69832.630.900228.900.788727.780.743326.810.8071
Tab.2 Performance tests for series and parallel connections with different quantities of SGB+CGB
方法基线模块SIGACCACFBPM/106FL/109Set5Set14BSD100Urban100
PSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIM
CGT-S××11.2152.8232.610.900328.940.789927.790.744426.890.8105
CGT-SF×11.2153.4432.620.900828.950.790227.800.744626.940.8114
CGT-C××11.2047.6132.520.899128.810.786527.710.741226.570.7996
CGT-CF×11.2048.2332.530.899028.820.787127.720.741226.580.8006
CGT-SC×11.2150.2232.630.900628.940.789727.800.744326.840.8081
CGT11.2150.8332.810.902429.030.792127.850.745627.120.8155
Tab.3 Experimental results of five CGT degenerative networks
Fig.6 Visualization of features extracted from different modules
Fig.7 Comparison of parameter quantities on Urban100 (×4)
方法年份倍数Set5Set14BSD100Urban100Manga109
PSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIM
IMDN[17]2019×238.000.960533.630.917732.190.899632.170.9283
HAN[19]2019×238.270.961434.160.921732.410.902733.350.938539.460.9785
SAN[32]2020×238.310.962034.070.921332.420.902833.100.937039.320.9792
RFANET[33]2020×238.260.961534.160.922032.410.902633.330.938939.440.9783
NLSN[34]2021×238.340.961834.080.923132.430.902733.420.939439.590.9789
EMT[35]2024×238.290.961534.230.922932.400.902733.280.938539.590.9789
BiGLFE[36]2024×238.150.960133.800.919432.290.899432.710.932938.960.9771
CMSN[37]2024×238.180.961233.840.919532.300.901432.650.932939.110.9780
CGT(本研究模型)2024×238.360.961834.110.922032.410.902633.480.939539.670.9792
IMDN[17]2019×334.360.927030.320.841729.090.804628.170.851933.610.9445
HAN[19]2019×334.750.929930.670.848329.320.811029.100.870534.480.9500
SAN[32]2020×334.750.930030.590.847629.330.811228.930.867134.300.9494
RFANET[33]2020×334.790.930030.670.848729.340.811529.150.872034.590.9506
NLSN[34]2021×334.850.930630.700.848529.340.811729.250.872634.570.9508
EMT[35]2024×334.800.930330.710.848929.330.811329.160.871634.650.9508
BiGLFE[36]2024×334.590.927630.330.844929.240.805928.760.864234.030.9460
CMSN[37]2024×334.620.928830.500.845229.220.808228.600.861234.120.9476
CGT(本研究模型)2024×334.910.930830.750.849629.360.811929.260.872934.770.9514
IMDN[17]2019×432.210.894828.580.781127.560.735326.040.7838
HAN[19]2019×432.590.900028.870.789127.780.744426.960.810931.270.9184
SAN[32]2020×432.640.900328.920.788827.780.743626.790.806831.180.9169
RFANET[33]2020×432.660.900428.880.789427.790.744226.920.811231.410.9187
NLSN[34]2021×432.640.900228.900.789027.800.744226.850.809431.420.9177
EMT[35]2024×432.640.900328.970.790127.810.744126.980.811831.480.9190
BiGLFE[36]2024×432.520.897128.640.785827.740.737726.600.801631.000.9123
CMSN[37]2024×432.410.897528.770.785127.680.739826.440.796431.000.9133
CGT(本研究模型)2024×432.810.902429.030.792127.850.745627.120.815531.810.9224
Tab.4 Comparison with advanced methods on five benchmark datasets
Fig.8 Visual comparison of reconstructed images on Set14 (×2)
Fig.9 Visual comparison of reconstructed images on BSD100 (×4)
Fig.10 Visual comparison of reconstructed images on Urban100 (×4)
[1]   DENG C, LUO X, WANG W Multiple frame splicing and degradation learning for hyperspectral imagery super-resolution[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 8389- 8401
doi: 10.1109/JSTARS.2022.3207777
[2]   MA J, LIU S, CHENG S, et al STSRNet: self-texture transfer super-resolution and refocusing network[J]. IEEE Transactions on Medical Imaging, 2022, 41 (2): 383- 393
doi: 10.1109/TMI.2021.3112923
[3]   ZHAO Z, ZHANG Y, LI C, et al Thermal UAV image super-resolution guided by multiple visible cues[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5000314
[4]   DONG C, LOY C C, HE K, et al Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38 (2): 295- 307
doi: 10.1109/TPAMI.2015.2439281
[5]   KIM J, LEE J K, LEE K M. Accurate image super-resolution using very deep convolutional networks [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 1646–1654.
[6]   ZHANG Y, TIAN Y, KONG Y, et al. Residual dense network for image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2472–2481.
[7]   HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132–7141.
[8]   LIM B, SON S, KIM H, et al. Enhanced deep residual networks for single image super-resolution [C]// IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 1132–1140.
[9]   VASWANI A, SHAZEER N M, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. California: Curran Associates Inc., 2017: 6000–6010.
[10]   LIANG J, CAO J, SUN G, et al. SwinIR: image restoration using swin transformer [C]// IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021: 1833–1844.
[11]   CHU X, TIAN Z, WANG Y, et al. Twins: revisiting the design of spatial attention in vision transformers [J]. Advances in Neural Information Processing Systems. 2021, 34: 9355–9366.
[12]   YANG R, MA H, WU J, et al. ScalableViT: rethinking the context-oriented generalization of vision transformer [C]// Proceedings of Computer Vision – ECCV 2022. Cham: Springer, 2022: 480–496.
[13]   CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers [C]// Proceedings of Computer Vision – ECCV 2020. Cham: Springer, 2020: 213–229.
[14]   BEAL J, KIM E, TZENG E, et al. Toward Transformer-Based Object Detection [EB/OL]. (2020−12−17) [2025−09−15]. https://doi.org/10.48550/arXiv.2012.09958.
[15]   PENG Z, HUANG W, GU S, et al. Conformer: local features coupling global representations for visual recognition [C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021: 357−366.
[16]   WANG H, CHEN X, NI B, et al. Omni aggregation networks for lightweight image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 22378–22387.
[17]   HUI Z, GAO X, YANG Y, et al. Lightweight image super-resolution with information multi-distillation network [C]// 27th ACM International Conference on Multimedia. [S.l.]: ACM, 2019: 2024−2032.
[18]   JI J, ZHONG B, WU Q, et al A channel-wise multi-scale network for single image super-resolution[J]. IEEE Signal Processing Letters, 2024, 31: 805- 809
doi: 10.1109/LSP.2024.3372781
[19]   WANG X, JI H, SHI C, et al. Heterogeneous Graph Attention Network [C]// The World Wide Web Conference. San Francisco: [s.n.], 2019: 2022–2032.
[20]   LI X, DONG J, TANG J, et al. DLGSANet: lightweight dynamic local and global self-attention network for image super-resolution [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 12746–12755.
[21]   LIN H, CHENG X, WU X, et al. CAT: cross attention in vision transformer [C]// IEEE International Conference on Multimedia and Expo. Taipei: IEEE, 2022: 1–6.
[22]   CHEN Z, ZHANG Y, GU J, et al. Recursive Generalization Transformer for Image Super-Resolution [EB/OL]. (2023−03−11) [2025−09−15]. https://doi.org/10.48550/arXiv.2303.06373.
[23]   KINGMA D, BA J. Adam: a method for stochastic optimization [EB/OL]. (2014−12−14) [2025−09−15]. https://doi.org/10.48550/arXiv.1412.6980.
[24]   AGUSTSSON E, TIMOFTE R. NTIRE 2017 challenge on single image super-resolution: dataset and study [C]// IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 1122–1131.
[25]   BEVILACQUA M, ROUMY A, GUILLEMOT C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding [C]// Proceedings of British Machine Vision Conference. Surrey: British Machine Vision Association, 2012.
[26]   ZEYDE R, ELAD M, PROTTER M. On single image scale-up using sparse-representations [C]// Proceedings of International Conference on Curves and Surfaces. Avignon: Springer, 2010: 711–730.
[27]   MARTIN D, FOWLKES C, TAL D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics [C]// 8th IEEE International Conference on Computer Vision. Vancouver: IEEE, 2001: 416–423.
[28]   HUANG J B, SINGH A, AHUJA N. Single image super-resolution from transformed self-exemplars [C]// IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 5197–5206.
[29]   MATSUI Y, ITO K, ARAMAKI Y, et al Sketch-based manga retrieval using manga109 dataset[J]. Multimedia Tools and Applications, 2017, 76 (20): 21811- 21838
doi: 10.1007/s11042-016-4020-z
[30]   WANG Z, BOVIK A C, SHEIKH H R, et al Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13 (4): 600- 612
doi: 10.1109/TIP.2003.819861
[31]   ZHANG Y, TIAN Y, KONG Y, et al. Residual dense network for image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2472–2481.
[32]   LYN J, YAN S. Non-local Second-order attention network for single image super resolution [C]// Proceedings of International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Dublin: Springer, 2020: 267–279.
[33]   LIU J, ZHANG W, TANG Y, et al. Residual feature aggregation network for image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 2356–2365.
[34]   MEI Y, FAN Y, ZHOU Y. Image super-resolution with non-local sparse attention [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3516–3525.
[35]   ZHENG L, ZHU J, SHI J, et al Efficient mixed transformer for single image super-resolution[J]. Engineering Applications of Artificial Intelligence, 2024, 133: 108035
doi: 10.1016/j.engappai.2024.108035
[36]   HWANG K, YOON G, SONG J, et al Fusing bi-directional global–local features for single image super-resolution[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107336
doi: 10.1016/j.engappai.2023.107336
[1] Wenxin CHENG,Guanghui YAN,Wenwen CHANG,Baijing WU,Yaning HUANG. Channel-weighted multimodal feature fusion for EEG-based fatigue driving detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1775-1783.
[2] Chaoqun DONG,Zhan WANG,Ping LIAO,Shuai XIE,Yujie RONG,Jingsong ZHOU. Lightweight YOLOv5s-OCG rail sleeper crack detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1838-1845.
[3] Xuan MENG,Xueying ZHANG,Ying SUN,Yaru ZHOU. EEG emotion recognition based on electrode arrangement and Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1872-1880.
[4] Zhuguo ZHOU,Yujun LU,Liye LV. Improved YOLOv5s-based algorithm for printed circuit board defect detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1608-1616.
[5] Jiarui FU,Zhaofei LI,Hao ZHOU,Wei HUANG. Camouflaged object detection based on Convnextv2 and texture-edge guidance[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1718-1726.
[6] Jie LIU,You WU,Jiahe TIAN,Ke HAN. Based on improved Transformer for super-resolution reconstruction of lung CT images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1434-1442.
[7] Jingyao HE,Pengfei LI,Chengzhi WANG,Zhenming LV,Ping MU. Dynamic 3D reconstruction method using binocular vision and improved YOLOv8[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1443-1450.
[8] Shengju WANG,Zan ZHANG. Missing value imputation algorithm based on accelerated diffusion model[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1471-1480.
[9] Dongping ZHANG,Dawei WANG,Shuji HE,Siliang TANG,Zhiyong LIU,Zhongqiu LIU. Remaining useful life prediction of aircraft engines based on cross-dimensional feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1504-1513.
[10] Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN. Visual induced motion sickness estimation model based on attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1110-1118.
[11] Lihong WANG,Xinqian LIU,Jing LI,Zhiquan FENG. Network intrusion detection method based on federated learning and spatiotemporal feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1201-1210.
[12] Mengyao ZHANG,Jie ZHOU,Wenting LI,Yong ZHAO. Three-dimensional mesh segmentation framework using global and local information[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 912-919.
[13] Dejun ZHANG,Yanzi BAI,Feng CAO,Yiqi WU,Zhanya XU. Point cloud Transformer adapter for dense prediction task[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 920-928.
[14] Li MA,Yongshun WANG,Yao HU,Lei FAN. Pre-trained long-short spatiotemporal interleaved Transformer for traffic flow prediction applications[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 669-678.
[15] Shenchong LI,Xinhua ZENG,Chuanqu LIN. Multi-task environment perception algorithm for autonomous driving based on axial attention[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 769-777.