A multi-layer degradation module was proposed aiming at the problem that most remote sensing image super-resolution models rarely consider the impact of noise, blur, JPEG compression, and other factors on image reconstruction, as well as the limitations of Transformer modules in capturing high-frequency information. A CNN-Transformer hybrid network was designed, where CNN captures high-frequency details and Transformer extracts global information. These two components were combined by an attention-based aggregation module, enhancing local high-frequency detail reconstruction while maintaining global structural coherence. The model was tested on six random scenes from the AID dataset and compared with the MM-realSR model in PSNR and SSIM. Results show an average PSNR improvement of 1.61 dB and a SSIM increase of 0.023 over MM-realSR.
Mingzhi HU,Jun SUN,Biao YANG,Kairong CHANG,Junlong YANG. Super-resolution reconstruction of remote sensing image based on CNN and Transformer aggregation. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 938-946.
Fig.1Data synthesis process of single-layer and multi-layer degradation modules
Fig.2Diagram of overall network structure and deep feature extraction module structure
模型
飞机场
城市
农田
停车场
运动场
港口
PSNR/dB
SSIM
PSNR/dB
SSIM
PSNR/dB
SSIM
PSNR/dB
SSIM
PSNR/dB
SSIM
PSNR/dB
SSIM
Bicubic
26.22
0.6871
23.73
0.6182
29.49
0.7347
19.73
0.5451
25.45
0.6905
22.23
0.6699
Swinir
24.43
0.6598
22.19
0.5867
28.15
0.7142
17.79
0.5126
24.20
0.6666
20.09
0.6401
CDC
24.82
0.6272
22.50
0.5867
26.87
0.6601
20.04
0.5547
24.11
0.6550
21.71
0.6691
DAN
25.70
0.6922
23.60
0.6277
28.75
0.7307
19.72
0.5711
25.31
0.6960
21.45
0.6682
real-Esrgan
27.81
0.7296
24.82
0.6768
30.20
0.7664
21.06
0.6468
26.33
0.7338
22.84
0.7316
BSRGAN
27.74
0.7318
25.24
0.6754
30.80
0.7704
21.79
0.6413
27.10
0.7351
23.57
0.7328
MM-realSR
27.83
0.7649
25.64
0.7225
30.44
0.7852
22.42
0.6949
27.64
0.7826
23.95
0.7699
realHAT-TG
27.76
0.7470
25.34
0.6933
30.53
0.7739
21.96
0.6622
26.93
0.7495
23.54
0.7426
本文模型
29.50
0.7857
27.27
0.7462
32.43
0.8068
23.45
0.7292
29.57
0.8060
25.39
0.7860
Tab.1PSNR and SSIM metrics of different models on six randomly selected scenes from AID test dataset
Fig.3Visual comparison of reconstruction results across different models: quantitative PSNR/SSIM evaluation on six samples from AID test set
Fig.4Visual comparison of reconstruction results across different models: quantitative PSNR/SSIM evaluation on four samples from WHU-RS19 dataset
Fig.5Impact of degradation module on reconstruction effect
方法
飞机场
城市
农田
停车场
运动场
港口
PSNR/dB
SSIM
PSNR/dB
SSIM
PSNR/dB
SSIM
PSNR/dB
SSIM
PSNR/dB
SSIM
PSNR/dB
SSIM
B
26.88
0.7046
24.65
0.6516
29.67
0.7355
20.63
0.6037
26.47
0.7148
22.82
0.7018
B+H
28.55
0.7504
26.31
0.6929
31.47
0.7794
22.28
0.6422
28.30
0.7600
24.56
0.7343
B+H+G
28.60
0.7511
26.32
0.7026
31.53
0.7790
22.67
0.6774
28.53
0.7668
24.67
0.7512
B+H+G+A1
29.07
0.7717
26.82
0.7283
32.12
0.7963
23.07
0.7018
29.08
0.7883
25.10
0.7724
B+H+G+A
29.50
0.7857
27.27
0.7462
32.43
0.8068
23.45
0.7292
29.57
0.8060
25.39
0.7860
Tab.2PSNR and SSIM metrics of different ablation modules on six scenes selected from AID test set
Nb
测试集
PSNR/dB
SSIM
1
27.76
0.7690
2
27.78
0.7724
3
27.94
0.7769
4
27.86
0.7740
Tab.3Average PSNR and SSIM metrics across six scenes in AID dataset for deep feature extraction modules with varying counts
Fig.6Visualization of input and output feature maps for high-frequency and global modules
[1]
ZHANG H, YANG Z, ZHANG L, et al Super-resolution reconstruction for multi-angle remote sensing images considering resolution differences[J]. Remote Sensing, 2014, 6 (1): 637- 657
doi: 10.3390/rs6010637
[2]
PAPATHANASSIOU C, PETROU M. Super resolution: an overview [C]// IEEE International Geoscience and Remote Sensing Symposium . Seoul: IEEE, 2005: 5655-5658.
[3]
GLASNER D, BAGON S, IRANI M. Super-resolution from a single image [C]// IEEE 12th International Conference on Computer Vision . Kyoto: IEEE, 2009: 349-356.
[4]
DONG C, LOY C C, HE K, et al Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38 (2): 295- 307
[5]
LIM B, SON S, KIM H, et al. Enhanced deep residual networks for single image super-resolution [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops . Honolulu: IEEE, 2017: 136-144.
[6]
BEGIN I, FERRIE F R. Blind super-resolution using a learning-based approach [C]// Proceedings of the 17th International Conference on Pattern Recognition . Cambridge: IEEE, 2004: 85-89.
[7]
JOSHI M V, CHAUDHURI S, PANUGANTI R A learning-based method for image super-resolution from zoomed observations[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2005, 35 (3): 527- 537
doi: 10.1109/TSMCB.2005.846647
[8]
CHAN T M, ZHANG J. An improved super-resolution with manifold learning and histogram matching [C]// Advances in Biometrics: International Conference . Hong Kong: Springer, 2005: 756-762.
[9]
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [C]// International Conference on Learning Representations . Ethiopia: [s. n.], 2020.
[10]
DONG C, LOY C C, HE K, et al. Learning a deep convolutional network for image super-resolution [C]// 13th European Conference on Computer Vision . Switzerland: Springer, 2014: 184-199.
[11]
KIM J, LEE J K, LEE K M. Accurate image super-resolution using very deep convolutional networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 1646-1654.
[12]
LI W, ZHOU K, QI L, et al Lapar: linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond[J]. Advances in Neural Information Processing Systems, 2020, 33: 20343- 20355
[13]
LIANG J, CAO J, SUN G, et al. Swinir: image restoration using swin transformer [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 1833-1844.
[14]
CHEN H, WANG Y, GUO T, et al. Pre-trained image processing transformer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 12299-12310.
[15]
LEI S, SHI Z, ZOU Z Super-resolution for remote sensing images via local–global combined network[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14 (8): 1243- 1247
doi: 10.1109/LGRS.2017.2704122
[16]
PAN Z, MA W, GUO J, et al Super-resolution of single remote sensing image based on residual dense backprojection networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57 (10): 7918- 7933
doi: 10.1109/TGRS.2019.2917427
[17]
ZHANG D, SHAO J, LI X, et al Remote sensing image super-resolution via mixed high-order attention network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 59 (6): 5183- 5196
[18]
BAI J, YUAN L, XIA S T, et al. Improving vision transformers by revisiting high-frequency components [C]// European Conference on Computer Vision . Cham: Springer, 2022: 1-18.
[19]
ELAD M, FEUER A Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images[J]. IEEE Transactions on Image Processing, 1997, 6 (12): 1646- 1658
doi: 10.1109/83.650118
[20]
LIU C, SUN D On Bayesian adaptive video super resolution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36 (2): 346- 360
[21]
ZHANG K, LIANG J, VAN GOOL L, et al. Designing a practical degradation model for deep blind image super-resolution [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 4791-4800.
[22]
LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 10012-10022.
[23]
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2024-05-15]. https://arxiv.org/abs/1706.03762.
[24]
ZAMIR S W, ARORA A, KHAN S, et al. Restormer: efficient transformer for high-resolution image restoration [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 5728-5739.
[25]
XIA G, HU J, HU F, et al AID: a benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55 (7): 3965- 3981
doi: 10.1109/TGRS.2017.2685945
[26]
DAI D, YANG W Satellite image classification via two-layer sparse coding with biased image representation[J]. IEEE Geoscience and Remote Sensing Letters, 2010, 8 (1): 173- 176
[27]
TANCHENKO A Visual-PSNR measure of image quality[J]. Journal of Visual Communication and Image Representation, 2014, 25 (5): 874- 878
doi: 10.1016/j.jvcir.2014.01.008
[28]
WANG Z, BOVIK A C, SHEIKH H R, et al Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13 (4): 600- 612
doi: 10.1109/TIP.2003.819861
[29]
ZHANG W, LI X, SHI G, et al. Real-world image super-resolution as multi-task learning [J]. Advances in Neural Information Processing Systems , 2023, 36: 21003-21022.
[30]
WANG X, XIE Liangbin, DONG C, et al. Real-esrgan: training real-world blind super-resolution with pure synthetic data [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 1905-1914.
[31]
MOU C, WU Y, WANG X, et al. Metric learning based interactive modulation for real-world super-resolution [C]// European Conference on Computer Vision . Cham: Springer, 2022: 723-740.
[32]
WEI P, XIE Z, LU H, et al. Component divide-and-conquer for real-world image super-resolution [C]// 16th European Conference on Computer Vision . Glasgow: Springer, 2020: 101-117.