Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (12): 2516-2526    DOI: 10.3785/j.issn.1008-973X.2025.12.006
计算机技术     
双维度交叉融合驱动的图像超分辨率重建方法
贾晓芬1,2(),王子祥3,赵佰亭3,梁镇洹2,胡锐2
1. 安徽理工大学 煤炭无人化开采数智技术全国重点实验室,安徽 淮南 232001
2. 安徽理工大学 人工智能学院,安徽 淮南 232001
3. 安徽理工大学 电气与信息工程学院,安徽 淮南 232001
Image super-resolution reconstruction method driven by two-dimensional cross-fusion
Xiaofen JIA1,2(),Zixiang WANG3,Baiting ZHAO3,Zhenhuan LIANG2,Rui HU2
1. State Key Laboratory of Digital Intelligent Technology for Unmanned Coal Mining, Anhui University of Science and Technology, Huainan 232001, China
2. Institute of Artificial Intelligence, Anhui University of Science and Technology, Huainan 232001, China
3. Institute of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan 232001, China
 全文: PDF(2088 KB)   HTML
摘要:

针对现有图像超分辨率模型对图像深层语义信息中的底层特征提取不充分,导致重建图像细节丢失的问题,提出从空间、通道双维度交叉融合驱动的图像超分辨率模型. 该模型利用Transformer的注意力机制,在空间维度搭建空间密集全局注意力(SIGA),捕捉深层空间区域位置关系;在通道维度搭建通道交叉注意力(CCA),捕获通道间的特征依赖性. SIGA与CCA分别并联深度可分离卷积,增强模型高层语义信息中底层特征的提取能力,并使用空间压缩策略开发交叉融合模块(CFB),保证注意力模块与卷积之间的细粒特征高效融合. 级联双维度融合模块,助力深层语义信息全面交汇与聚合,实现恢复图像中的细腻结构. 实验表明,在比例因子为4的Urban100和Manga109中,相较于最新方法BiGLFE,该模型在PSNR上分别提高了0.52、0.81 dB.

关键词: 图像超分TransformerCNN融合空间注意力通道注意力    
Abstract:

The existing image super-resolution models do not extract the underlying features in the deep semantic information of the image sufficiently, leading to the loss of details of the reconstructed image. Thus, an image super-resolution model driven by the cross-fusion of two dimensions of space and channel was proposed. The model used Transformer’s attention mechanism to build spatial intensive global attention (SIGA) in the spatial dimension to capture the location relationship of deep spatial regions. Channel cross attention (CCA) was built in the channel dimension to capture the feature dependence between channels. SIGA and CCA were respectively connected in parallel with deep separable convolutions to enhance the model’s ability to extract low-level features from high-level semantic information. Meanwhile, a cross fusion block (CFB) was developed by using a spatial compression strategy to ensure the efficient fusion of fine-grained features between the attention modules and deep separable convolutions. The cascaded two-dimensional cross-fusion modules facilitate the comprehensive intersection and aggregation of deep semantic information, thus realizing the restoration of delicate structures in the image. The experimental results showed that the proposed model achieved a PSNR improvement of 0.52 dB and 0.81 dB respectively compared with the latest method BiGLFE, in Urban100 and Manga109 with a scale factor of 4.

Key words: image super-resolution    Transformer    CNN    fusion    spatial attention    channel attention
收稿日期: 2024-12-03 出版日期: 2025-11-25
:  TP 391  
基金资助: 国家自然科学基金资助项目(52174141);安徽省自然科学基金资助项目(2108085ME158);合肥综合性国家科学中心大健康研究院职业医学与健康联合研究中心科研资助项目(OMH-2023-10);安徽理工大学引进人才科研启动基金(2022yjrc44).
作者简介: 贾晓芬(1978—),女,教授,从事图像处理、深度学习、数字孪生研究. orcid.org/0000-0002-1891-7613. E-mail:jxfzbt2008@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
贾晓芬
王子祥
赵佰亭
梁镇洹
胡锐

引用本文:

贾晓芬,王子祥,赵佰亭,梁镇洹,胡锐. 双维度交叉融合驱动的图像超分辨率重建方法[J]. 浙江大学学报(工学版), 2025, 59(12): 2516-2526.

Xiaofen JIA,Zixiang WANG,Baiting ZHAO,Zhenhuan LIANG,Rui HU. Image super-resolution reconstruction method driven by two-dimensional cross-fusion. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2516-2526.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.12.006        https://www.zjujournals.com/eng/CN/Y2025/V59/I12/2516

图 1  CGT整体网络结构
图 2  空间密集全局注意力(SIGA)结构
图 3  通道交叉注意力(CCA)结构
图 4  交叉融合模块(CFB)结构
图 5  不同数量的CGTB在Manga109(×4)中的对比测试
pPM/106FL/109
24.20320.132
47.70835.485
611.21250.838
814.71766.19
1018.22181.543
表 1  不同数量CGTB对应参数
串/并联NSCPM/106FL/109Set5Set14BSD100Urban100
PSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIM
SGB+CGB串联28.2935.06032.660.900228.930.789627.780.743926.870.8094
311.2150.83832.810.902429.030.792127.850.745627.120.8155
418.9177.74532.700.901228.960.790627.820.744927.010.8128
SGB+CGB并联211.7148.32132.620.900128.950.789327.780.743626.880.8092
316.3566.50932.660.900728.970.789927.810.744126.940.8110
420.9884.69832.630.900228.900.788727.780.743326.810.8071
表 2  不同数量SGB+CGB的串并联性能测试
方法基线模块SIGACCACFBPM/106FL/109Set5Set14BSD100Urban100
PSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIM
CGT-S××11.2152.8232.610.900328.940.789927.790.744426.890.8105
CGT-SF×11.2153.4432.620.900828.950.790227.800.744626.940.8114
CGT-C××11.2047.6132.520.899128.810.786527.710.741226.570.7996
CGT-CF×11.2048.2332.530.899028.820.787127.720.741226.580.8006
CGT-SC×11.2150.2232.630.900628.940.789727.800.744326.840.8081
CGT11.2150.8332.810.902429.030.792127.850.745627.120.8155
表 3  5种CGT退化网络的实验结果
图 6  不同模块提取特征的可视化
图 7  Urban100(×4)上的参数量对比
方法年份倍数Set5Set14BSD100Urban100Manga109
PSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIM
IMDN[17]2019×238.000.960533.630.917732.190.899632.170.9283
HAN[19]2019×238.270.961434.160.921732.410.902733.350.938539.460.9785
SAN[32]2020×238.310.962034.070.921332.420.902833.100.937039.320.9792
RFANET[33]2020×238.260.961534.160.922032.410.902633.330.938939.440.9783
NLSN[34]2021×238.340.961834.080.923132.430.902733.420.939439.590.9789
EMT[35]2024×238.290.961534.230.922932.400.902733.280.938539.590.9789
BiGLFE[36]2024×238.150.960133.800.919432.290.899432.710.932938.960.9771
CMSN[37]2024×238.180.961233.840.919532.300.901432.650.932939.110.9780
CGT(本研究模型)2024×238.360.961834.110.922032.410.902633.480.939539.670.9792
IMDN[17]2019×334.360.927030.320.841729.090.804628.170.851933.610.9445
HAN[19]2019×334.750.929930.670.848329.320.811029.100.870534.480.9500
SAN[32]2020×334.750.930030.590.847629.330.811228.930.867134.300.9494
RFANET[33]2020×334.790.930030.670.848729.340.811529.150.872034.590.9506
NLSN[34]2021×334.850.930630.700.848529.340.811729.250.872634.570.9508
EMT[35]2024×334.800.930330.710.848929.330.811329.160.871634.650.9508
BiGLFE[36]2024×334.590.927630.330.844929.240.805928.760.864234.030.9460
CMSN[37]2024×334.620.928830.500.845229.220.808228.600.861234.120.9476
CGT(本研究模型)2024×334.910.930830.750.849629.360.811929.260.872934.770.9514
IMDN[17]2019×432.210.894828.580.781127.560.735326.040.7838
HAN[19]2019×432.590.900028.870.789127.780.744426.960.810931.270.9184
SAN[32]2020×432.640.900328.920.788827.780.743626.790.806831.180.9169
RFANET[33]2020×432.660.900428.880.789427.790.744226.920.811231.410.9187
NLSN[34]2021×432.640.900228.900.789027.800.744226.850.809431.420.9177
EMT[35]2024×432.640.900328.970.790127.810.744126.980.811831.480.9190
BiGLFE[36]2024×432.520.897128.640.785827.740.737726.600.801631.000.9123
CMSN[37]2024×432.410.897528.770.785127.680.739826.440.796431.000.9133
CGT(本研究模型)2024×432.810.902429.030.792127.850.745627.120.815531.810.9224
表 4  所提方法在5个基准数据集上与先进方法的对比
图 8  Set14(×2)上的重建视觉对比
图 9  B100(×4)上的重建视觉对比
图 10  Urban100(×4)上的重建视觉对比
1 DENG C, LUO X, WANG W Multiple frame splicing and degradation learning for hyperspectral imagery super-resolution[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 8389- 8401
doi: 10.1109/JSTARS.2022.3207777
2 MA J, LIU S, CHENG S, et al STSRNet: self-texture transfer super-resolution and refocusing network[J]. IEEE Transactions on Medical Imaging, 2022, 41 (2): 383- 393
doi: 10.1109/TMI.2021.3112923
3 ZHAO Z, ZHANG Y, LI C, et al Thermal UAV image super-resolution guided by multiple visible cues[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5000314
4 DONG C, LOY C C, HE K, et al Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38 (2): 295- 307
doi: 10.1109/TPAMI.2015.2439281
5 KIM J, LEE J K, LEE K M. Accurate image super-resolution using very deep convolutional networks [C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 1646–1654.
6 ZHANG Y, TIAN Y, KONG Y, et al. Residual dense network for image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2472–2481.
7 HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132–7141.
8 LIM B, SON S, KIM H, et al. Enhanced deep residual networks for single image super-resolution [C]// IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 1132–1140.
9 VASWANI A, SHAZEER N M, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. California: Curran Associates Inc., 2017: 6000–6010.
10 LIANG J, CAO J, SUN G, et al. SwinIR: image restoration using swin transformer [C]// IEEE/CVF International Conference on Computer Vision Workshops. Montreal: IEEE, 2021: 1833–1844.
11 CHU X, TIAN Z, WANG Y, et al. Twins: revisiting the design of spatial attention in vision transformers [J]. Advances in Neural Information Processing Systems. 2021, 34: 9355–9366.
12 YANG R, MA H, WU J, et al. ScalableViT: rethinking the context-oriented generalization of vision transformer [C]// Proceedings of Computer Vision – ECCV 2022. Cham: Springer, 2022: 480–496.
13 CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers [C]// Proceedings of Computer Vision – ECCV 2020. Cham: Springer, 2020: 213–229.
14 BEAL J, KIM E, TZENG E, et al. Toward Transformer-Based Object Detection [EB/OL]. (2020−12−17) [2025−09−15]. https://doi.org/10.48550/arXiv.2012.09958.
15 PENG Z, HUANG W, GU S, et al. Conformer: local features coupling global representations for visual recognition [C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021: 357−366.
16 WANG H, CHEN X, NI B, et al. Omni aggregation networks for lightweight image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 22378–22387.
17 HUI Z, GAO X, YANG Y, et al. Lightweight image super-resolution with information multi-distillation network [C]// 27th ACM International Conference on Multimedia. [S.l.]: ACM, 2019: 2024−2032.
18 JI J, ZHONG B, WU Q, et al A channel-wise multi-scale network for single image super-resolution[J]. IEEE Signal Processing Letters, 2024, 31: 805- 809
doi: 10.1109/LSP.2024.3372781
19 WANG X, JI H, SHI C, et al. Heterogeneous Graph Attention Network [C]// The World Wide Web Conference. San Francisco: [s.n.], 2019: 2022–2032.
20 LI X, DONG J, TANG J, et al. DLGSANet: lightweight dynamic local and global self-attention network for image super-resolution [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 12746–12755.
21 LIN H, CHENG X, WU X, et al. CAT: cross attention in vision transformer [C]// IEEE International Conference on Multimedia and Expo. Taipei: IEEE, 2022: 1–6.
22 CHEN Z, ZHANG Y, GU J, et al. Recursive Generalization Transformer for Image Super-Resolution [EB/OL]. (2023−03−11) [2025−09−15]. https://doi.org/10.48550/arXiv.2303.06373.
23 KINGMA D, BA J. Adam: a method for stochastic optimization [EB/OL]. (2014−12−14) [2025−09−15]. https://doi.org/10.48550/arXiv.1412.6980.
24 AGUSTSSON E, TIMOFTE R. NTIRE 2017 challenge on single image super-resolution: dataset and study [C]// IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 1122–1131.
25 BEVILACQUA M, ROUMY A, GUILLEMOT C, et al. Low-complexity single-image super-resolution based on nonnegative neighbor embedding [C]// Proceedings of British Machine Vision Conference. Surrey: British Machine Vision Association, 2012.
26 ZEYDE R, ELAD M, PROTTER M. On single image scale-up using sparse-representations [C]// Proceedings of International Conference on Curves and Surfaces. Avignon: Springer, 2010: 711–730.
27 MARTIN D, FOWLKES C, TAL D, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics [C]// 8th IEEE International Conference on Computer Vision. Vancouver: IEEE, 2001: 416–423.
28 HUANG J B, SINGH A, AHUJA N. Single image super-resolution from transformed self-exemplars [C]// IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 5197–5206.
29 MATSUI Y, ITO K, ARAMAKI Y, et al Sketch-based manga retrieval using manga109 dataset[J]. Multimedia Tools and Applications, 2017, 76 (20): 21811- 21838
doi: 10.1007/s11042-016-4020-z
30 WANG Z, BOVIK A C, SHEIKH H R, et al Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13 (4): 600- 612
doi: 10.1109/TIP.2003.819861
31 ZHANG Y, TIAN Y, KONG Y, et al. Residual dense network for image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 2472–2481.
32 LYN J, YAN S. Non-local Second-order attention network for single image super resolution [C]// Proceedings of International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Dublin: Springer, 2020: 267–279.
33 LIU J, ZHANG W, TANG Y, et al. Residual feature aggregation network for image super-resolution [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 2356–2365.
34 MEI Y, FAN Y, ZHOU Y. Image super-resolution with non-local sparse attention [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3516–3525.
35 ZHENG L, ZHU J, SHI J, et al Efficient mixed transformer for single image super-resolution[J]. Engineering Applications of Artificial Intelligence, 2024, 133: 108035
doi: 10.1016/j.engappai.2024.108035
36 HWANG K, YOON G, SONG J, et al Fusing bi-directional global–local features for single image super-resolution[J]. Engineering Applications of Artificial Intelligence, 2024, 127: 107336
doi: 10.1016/j.engappai.2023.107336
[1] 程文鑫,闫光辉,常文文,吴佰靖,黄亚宁. 基于通道加权的多模态特征融合用于EEG疲劳驾驶检测[J]. 浙江大学学报(工学版), 2025, 59(9): 1775-1783.
[2] 董超群,汪战,廖平,谢帅,荣玉杰,周靖淞. 轻量化YOLOv5s-OCG的轨枕裂纹检测算法[J]. 浙江大学学报(工学版), 2025, 59(9): 1838-1845.
[3] 孟璇,张雪英,孙颖,周雅茹. 基于电极排列和Transformer的脑电情感识别[J]. 浙江大学学报(工学版), 2025, 59(9): 1872-1880.
[4] 周著国,鲁玉军,吕利叶. 基于改进YOLOv5s的印刷电路板缺陷检测算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1608-1616.
[5] 付家瑞,李兆飞,周豪,黄惟. 基于Convnextv2与纹理边缘引导的伪装目标检测[J]. 浙江大学学报(工学版), 2025, 59(8): 1718-1726.
[6] 刘杰,吴优,田佳禾,韩轲. 改进Transformer的肺部CT图像超分辨率重建[J]. 浙江大学学报(工学版), 2025, 59(7): 1434-1442.
[7] 何婧瑶,李鹏飞,汪承志,吕振鸣,牟萍. 基于双目视觉和改进YOLOv8的动态三维重建方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1443-1450.
[8] 赵威,张万枝,侯加林,侯瑞,李玉华,赵乐俊,程进. 基于改进深度强化学习算法的农业机器人路径规划[J]. 浙江大学学报(工学版), 2025, 59(7): 1492-1503.
[9] 章东平,王大为,何数技,汤斯亮,刘志勇,刘中秋. 基于跨维度特征融合的航空发动机寿命预测[J]. 浙江大学学报(工学版), 2025, 59(7): 1504-1513.
[10] 蔡永青,韩成,权巍,陈兀迪. 基于注意力机制的视觉诱导晕动症评估模型[J]. 浙江大学学报(工学版), 2025, 59(6): 1110-1118.
[11] 王立红,刘新倩,李静,冯志全. 基于联邦学习和时空特征融合的网络入侵检测方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1201-1210.
[12] 张梦瑶,周杰,李文婷,赵勇. 结合全局信息和局部信息的三维网格分割框架[J]. 浙江大学学报(工学版), 2025, 59(5): 912-919.
[13] 张德军,白燕子,曹锋,吴亦奇,徐战亚. 面向密集预测任务的点云Transformer适配器[J]. 浙江大学学报(工学版), 2025, 59(5): 920-928.
[14] 马莉,王永顺,胡瑶,范磊. 预训练长短时空交错Transformer在交通流预测中的应用[J]. 浙江大学学报(工学版), 2025, 59(4): 669-678.
[15] 李沈崇,曾新华,林传渠. 基于轴向注意力的多任务自动驾驶环境感知算法[J]. 浙江大学学报(工学版), 2025, 59(4): 769-777.