Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (5): 938-946    DOI: 10.3785/j.issn.1008-973X.2025.05.007
    
Super-resolution reconstruction of remote sensing image based on CNN and Transformer aggregation
Mingzhi HU(),Jun SUN*(),Biao YANG,Kairong CHANG,Junlong YANG
School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
Download: HTML     PDF(12404KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A multi-layer degradation module was proposed aiming at the problem that most remote sensing image super-resolution models rarely consider the impact of noise, blur, JPEG compression, and other factors on image reconstruction, as well as the limitations of Transformer modules in capturing high-frequency information. A CNN-Transformer hybrid network was designed, where CNN captures high-frequency details and Transformer extracts global information. These two components were combined by an attention-based aggregation module, enhancing local high-frequency detail reconstruction while maintaining global structural coherence. The model was tested on six random scenes from the AID dataset and compared with the MM-realSR model in PSNR and SSIM. Results show an average PSNR improvement of 1.61 dB and a SSIM increase of 0.023 over MM-realSR.



Key wordsremote sensing image      super-resolution reconstruction      multi-layer degradation module      high-frequency information      global information      aggregation module     
Received: 28 May 2024      Published: 25 April 2025
CLC:  TP 751  
Fund:  国家自然科学基金资助项目(62363019);云南省基础研究计划资助项目(202401AT070355).
Corresponding Authors: Jun SUN     E-mail: 1404481618@qq.com;31408891@qq.com
Cite this article:

Mingzhi HU,Jun SUN,Biao YANG,Kairong CHANG,Junlong YANG. Super-resolution reconstruction of remote sensing image based on CNN and Transformer aggregation. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 938-946.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.05.007     OR     https://www.zjujournals.com/eng/Y2025/V59/I5/938


基于CNN和Transformer聚合的遥感图像超分辨率重建

针对现有的遥感图像超分辨模型很少考虑噪声、模糊、JPEG压缩等因素对图像重建所带来的影响,以及Transformer模块构建高频信息能力受限的问题,提出多层退化模块. 设计基于CNN和Transformer聚合的网络,使用CNN识别图像的高频信息,Transformer提取全局信息. 利用基于注意力机制的聚合模块将2个模块聚合,在保持全局结构连贯性的同时,显著增强局部高频细节的重建精度. 利用所提模型,在AID数据集上随机选取6个场景进行实验,与MM-realSR模型在PSNR和SSIM指标上进行比较.结果表明,所提模型在PSNR指标上相比于MM-realSR模型平均提高1.61 dB,SSIM指标平均提升0.023.


关键词: 遥感图像,  超分辨率重建,  多层退化模块,  高频信息,  全局信息,  聚合模块 
Fig.1 Data synthesis process of single-layer and multi-layer degradation modules
Fig.2 Diagram of overall network structure and deep feature extraction module structure
模型飞机场城市农田停车场运动场港口
PSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIM
Bicubic26.220.687123.730.618229.490.734719.730.545125.450.690522.230.6699
Swinir24.430.659822.190.586728.150.714217.790.512624.200.666620.090.6401
CDC24.820.627222.500.586726.870.660120.040.554724.110.655021.710.6691
DAN25.700.692223.600.627728.750.730719.720.571125.310.696021.450.6682
real-Esrgan27.810.729624.820.676830.200.766421.060.646826.330.733822.840.7316
BSRGAN27.740.731825.240.675430.800.770421.790.641327.100.735123.570.7328
MM-realSR27.830.764925.640.722530.440.785222.420.694927.640.782623.950.7699
realHAT-TG27.760.747025.340.693330.530.773921.960.662226.930.749523.540.7426
本文模型29.500.785727.270.746232.430.806823.450.729229.570.806025.390.7860
Tab.1 PSNR and SSIM metrics of different models on six randomly selected scenes from AID test dataset
Fig.3 Visual comparison of reconstruction results across different models: quantitative PSNR/SSIM evaluation on six samples from AID test set
Fig.4 Visual comparison of reconstruction results across different models: quantitative PSNR/SSIM evaluation on four samples from WHU-RS19 dataset
Fig.5 Impact of degradation module on reconstruction effect
方法飞机场城市农田停车场运动场港口
PSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIMPSNR/dBSSIM
B26.880.704624.650.651629.670.735520.630.603726.470.714822.820.7018
B+H28.550.750426.310.692931.470.779422.280.642228.300.760024.560.7343
B+H+G28.600.751126.320.702631.530.779022.670.677428.530.766824.670.7512
B+H+G+A129.070.771726.820.728332.120.796323.070.701829.080.788325.100.7724
B+H+G+A29.500.785727.270.746232.430.806823.450.729229.570.806025.390.7860
Tab.2 PSNR and SSIM metrics of different ablation modules on six scenes selected from AID test set
Nb测试集
PSNR/dBSSIM
127.760.7690
227.780.7724
327.940.7769
427.860.7740
Tab.3 Average PSNR and SSIM metrics across six scenes in AID dataset for deep feature extraction modules with varying counts
Fig.6 Visualization of input and output feature maps for high-frequency and global modules
[1]   ZHANG H, YANG Z, ZHANG L, et al Super-resolution reconstruction for multi-angle remote sensing images considering resolution differences[J]. Remote Sensing, 2014, 6 (1): 637- 657
doi: 10.3390/rs6010637
[2]   PAPATHANASSIOU C, PETROU M. Super resolution: an overview [C]// IEEE International Geoscience and Remote Sensing Symposium . Seoul: IEEE, 2005: 5655-5658.
[3]   GLASNER D, BAGON S, IRANI M. Super-resolution from a single image [C]// IEEE 12th International Conference on Computer Vision . Kyoto: IEEE, 2009: 349-356.
[4]   DONG C, LOY C C, HE K, et al Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38 (2): 295- 307
[5]   LIM B, SON S, KIM H, et al. Enhanced deep residual networks for single image super-resolution [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops . Honolulu: IEEE, 2017: 136-144.
[6]   BEGIN I, FERRIE F R. Blind super-resolution using a learning-based approach [C]// Proceedings of the 17th International Conference on Pattern Recognition . Cambridge: IEEE, 2004: 85-89.
[7]   JOSHI M V, CHAUDHURI S, PANUGANTI R A learning-based method for image super-resolution from zoomed observations[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2005, 35 (3): 527- 537
doi: 10.1109/TSMCB.2005.846647
[8]   CHAN T M, ZHANG J. An improved super-resolution with manifold learning and histogram matching [C]// Advances in Biometrics: International Conference . Hong Kong: Springer, 2005: 756-762.
[9]   DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [C]// International Conference on Learning Representations . Ethiopia: [s. n.], 2020.
[10]   DONG C, LOY C C, HE K, et al. Learning a deep convolutional network for image super-resolution [C]// 13th European Conference on Computer Vision . Switzerland: Springer, 2014: 184-199.
[11]   KIM J, LEE J K, LEE K M. Accurate image super-resolution using very deep convolutional networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 1646-1654.
[12]   LI W, ZHOU K, QI L, et al Lapar: linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond[J]. Advances in Neural Information Processing Systems, 2020, 33: 20343- 20355
[13]   LIANG J, CAO J, SUN G, et al. Swinir: image restoration using swin transformer [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 1833-1844.
[14]   CHEN H, WANG Y, GUO T, et al. Pre-trained image processing transformer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 12299-12310.
[15]   LEI S, SHI Z, ZOU Z Super-resolution for remote sensing images via local–global combined network[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14 (8): 1243- 1247
doi: 10.1109/LGRS.2017.2704122
[16]   PAN Z, MA W, GUO J, et al Super-resolution of single remote sensing image based on residual dense backprojection networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57 (10): 7918- 7933
doi: 10.1109/TGRS.2019.2917427
[17]   ZHANG D, SHAO J, LI X, et al Remote sensing image super-resolution via mixed high-order attention network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 59 (6): 5183- 5196
[18]   BAI J, YUAN L, XIA S T, et al. Improving vision transformers by revisiting high-frequency components [C]// European Conference on Computer Vision . Cham: Springer, 2022: 1-18.
[19]   ELAD M, FEUER A Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images[J]. IEEE Transactions on Image Processing, 1997, 6 (12): 1646- 1658
doi: 10.1109/83.650118
[20]   LIU C, SUN D On Bayesian adaptive video super resolution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36 (2): 346- 360
[21]   ZHANG K, LIANG J, VAN GOOL L, et al. Designing a practical degradation model for deep blind image super-resolution [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 4791-4800.
[22]   LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 10012-10022.
[23]   VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2024-05-15]. https://arxiv.org/abs/1706.03762.
[24]   ZAMIR S W, ARORA A, KHAN S, et al. Restormer: efficient transformer for high-resolution image restoration [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 5728-5739.
[25]   XIA G, HU J, HU F, et al AID: a benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55 (7): 3965- 3981
doi: 10.1109/TGRS.2017.2685945
[26]   DAI D, YANG W Satellite image classification via two-layer sparse coding with biased image representation[J]. IEEE Geoscience and Remote Sensing Letters, 2010, 8 (1): 173- 176
[27]   TANCHENKO A Visual-PSNR measure of image quality[J]. Journal of Visual Communication and Image Representation, 2014, 25 (5): 874- 878
doi: 10.1016/j.jvcir.2014.01.008
[28]   WANG Z, BOVIK A C, SHEIKH H R, et al Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13 (4): 600- 612
doi: 10.1109/TIP.2003.819861
[29]   ZHANG W, LI X, SHI G, et al. Real-world image super-resolution as multi-task learning [J]. Advances in Neural Information Processing Systems , 2023, 36: 21003-21022.
[30]   WANG X, XIE Liangbin, DONG C, et al. Real-esrgan: training real-world blind super-resolution with pure synthetic data [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 1905-1914.
[31]   MOU C, WU Y, WANG X, et al. Metric learning based interactive modulation for real-world super-resolution [C]// European Conference on Computer Vision . Cham: Springer, 2022: 723-740.
[32]   WEI P, XIE Z, LU H, et al. Component divide-and-conquer for real-world image super-resolution [C]// 16th European Conference on Computer Vision . Glasgow: Springer, 2020: 101-117.
[1] Zhenli ZHANG,Xinkai HU,Fan LI,Zhicheng FENG,Zhichao CHEN. Semantic segmentation algorithm for multiscale remote sensing images based on CNN and Efficient Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 778-786.
[2] Hai HUAN,Yu SHENG,Chenxi GU. Global guidance multi-feature fusion network based on remote sensing image road extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 696-707.
[3] Longxue LIANG,Chenglong HE,Xiaosuo WU,Haowen YAN. Remote sensing image semantic segmentation network based on global information extraction and reconstruction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(11): 2270-2279.
[4] Zhicheng FENG,Jie YANG,Zhichao CHEN. Urban road network extraction method based on lightweight Transformer[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 40-49.
[5] Zhao-yang SONG,Xiao-qiang ZHAO,Yong-yong HUI,Hong-mei JIANG. Image super-resolution reconstruction algorithm based on multi-level continuous encoding and decoding[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1885-1893.
[6] Chun-juan LIU,Ze QIAO,Hao-wen YAN,Xiao-suo WU,Jia-wei WANG,Yu-qiang XIN. Semantic segmentation network for remote sensing image based on multi-scale mutual attention[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1335-1344.
[7] Xin-dong LV,Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG. Structured image super-resolution network based on improved Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 865-874.
[8] Yan ZHAN,Die HU,Hong-tao TANG,Jian-sha LU,Jian TAN,Chang-rui LIU. Image data enhancement method based on improved generative adversarial network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1998-2010.
[9] Guo-hua ZHOU,Jian-wei LU,Tong-guang NI,Xue-long HU. Hierarchical nonlinear subspace dictionary learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1159-1167.
[10] Ze-kang WU,Shan ZHAO,Hong-wei LI,Yi-rui JIANG. Spatial global context information network for semantic segmentation of remote sensing image[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 795-802.
[11] Yun-zuo ZHANG,Wei GUO,Zhao-quan CAI,Wen-bo LI. Remote sensing image target detection combining multi-scale and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2215-2223.
[12] Dong-jie YANG,Xian-jun GAO,Shu-hao RAN,Guang-bin ZHANG,Ping WANG,Yuan-wei YANG. Building extraction based on multiple multiscale-feature fusion attention network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1924-1934.
[13] Jing-xin CHANG,Xian-jun GAO,Yuan-wei YANG,Shao-hua LI,Ping WANG. Building boundary optimization method based on object-oriented contour constraint GGVF Snake model[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(10): 1847-1855.
[14] Shu-hao RAN,Yu-long HU,Yuan-wei YANG,Xian-jun GAO,Xi LI,Ming-zhu CHEN. Building extraction from high resolution remote sensing image based on samples morphological transformation[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(5): 996-1006.
[15] Ran DUAN,Deng-wen ZHOU,Li-juan ZHAO,Xiao-liang CHAI. Image super-resolution reconstruction based on multi-scale feature mapping network[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(7): 1331-1339.