Super-resolution reconstruction of remote sensing image based on CNN and Transformer aggregation

doi:10.3785/j.issn.1008-973X.2025.05.007

Journal of ZheJiang University (Engineering Science)

2025, Vol. 59

Issue (5): 938-946 DOI: 10.3785/j.issn.1008-973X.2025.05.007

Super-resolution reconstruction of remote sensing image based on CNN and Transformer aggregation

Mingzhi HU(

),Jun SUN*(

),Biao YANG,Kairong CHANG,Junlong YANG

School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

Download:

HTML

PDF(12404KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A multi-layer degradation module was proposed aiming at the problem that most remote sensing image super-resolution models rarely consider the impact of noise, blur, JPEG compression, and other factors on image reconstruction, as well as the limitations of Transformer modules in capturing high-frequency information. A CNN-Transformer hybrid network was designed, where CNN captures high-frequency details and Transformer extracts global information. These two components were combined by an attention-based aggregation module, enhancing local high-frequency detail reconstruction while maintaining global structural coherence. The model was tested on six random scenes from the AID dataset and compared with the MM-realSR model in PSNR and SSIM. Results show an average PSNR improvement of 1.61 dB and a SSIM increase of 0.023 over MM-realSR.

Key words： remote sensing image super-resolution reconstruction multi-layer degradation module high-frequency information global information aggregation module

Received: 28 May 2024 Published: 25 April 2025

CLC:

TP 751

Fund: 国家自然科学基金资助项目（62363019）；云南省基础研究计划资助项目（202401AT070355）.

Corresponding Authors: Jun SUN E-mail: 1404481618@qq.com;31408891@qq.com

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Mingzhi HU
	Jun SUN
	Biao YANG
	Kairong CHANG
	Junlong YANG

Cite this article:

Mingzhi HU,Jun SUN,Biao YANG,Kairong CHANG,Junlong YANG. Super-resolution reconstruction of remote sensing image based on CNN and Transformer aggregation. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 938-946.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.05.007 OR https://www.zjujournals.com/eng/Y2025/V59/I5/938

基于CNN和Transformer聚合的遥感图像超分辨率重建

针对现有的遥感图像超分辨模型很少考虑噪声、模糊、JPEG压缩等因素对图像重建所带来的影响，以及Transformer模块构建高频信息能力受限的问题，提出多层退化模块. 设计基于CNN和Transformer聚合的网络，使用CNN识别图像的高频信息，Transformer提取全局信息. 利用基于注意力机制的聚合模块将2个模块聚合，在保持全局结构连贯性的同时，显著增强局部高频细节的重建精度. 利用所提模型，在AID数据集上随机选取6个场景进行实验，与MM-realSR模型在PSNR和SSIM指标上进行比较.结果表明，所提模型在PSNR指标上相比于MM-realSR模型平均提高1.61 dB，SSIM指标平均提升0.023.

关键词： 遥感图像, 超分辨率重建, 多层退化模块, 高频信息, 全局信息, 聚合模块

Fig.1 Data synthesis process of single-layer and multi-layer degradation modules

Fig.2 Diagram of overall network structure and deep feature extraction module structure

Tab.1 PSNR and SSIM metrics of different models on six randomly selected scenes from AID test dataset

Fig.3 Visual comparison of reconstruction results across different models: quantitative PSNR/SSIM evaluation on six samples from AID test set

Fig.4 Visual comparison of reconstruction results across different models: quantitative PSNR/SSIM evaluation on four samples from WHU-RS19 dataset

Fig.5 Impact of degradation module on reconstruction effect

Tab.2 PSNR and SSIM metrics of different ablation modules on six scenes selected from AID test set

Tab.3 Average PSNR and SSIM metrics across six scenes in AID dataset for deep feature extraction modules with varying counts

Fig.6 Visualization of input and output feature maps for high-frequency and global modules


[1]	ZHANG H, YANG Z, ZHANG L, et al Super-resolution reconstruction for multi-angle remote sensing images considering resolution differences[J]. Remote Sensing, 2014, 6 (1): 637- 657 doi: 10.3390/rs6010637

[2]	PAPATHANASSIOU C, PETROU M. Super resolution: an overview [C]// IEEE International Geoscience and Remote Sensing Symposium . Seoul: IEEE, 2005: 5655-5658.

[3]	GLASNER D, BAGON S, IRANI M. Super-resolution from a single image [C]// IEEE 12th International Conference on Computer Vision . Kyoto: IEEE, 2009: 349-356.

[4]	DONG C, LOY C C, HE K, et al Image super-resolution using deep convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38 (2): 295- 307

[5]	LIM B, SON S, KIM H, et al. Enhanced deep residual networks for single image super-resolution [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops . Honolulu: IEEE, 2017: 136-144.

[6]	BEGIN I, FERRIE F R. Blind super-resolution using a learning-based approach [C]// Proceedings of the 17th International Conference on Pattern Recognition . Cambridge: IEEE, 2004: 85-89.

[7]	JOSHI M V, CHAUDHURI S, PANUGANTI R A learning-based method for image super-resolution from zoomed observations[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2005, 35 (3): 527- 537 doi: 10.1109/TSMCB.2005.846647

[8]	CHAN T M, ZHANG J. An improved super-resolution with manifold learning and histogram matching [C]// Advances in Biometrics: International Conference . Hong Kong: Springer, 2005: 756-762.

[9]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [C]// International Conference on Learning Representations . Ethiopia: [s. n.], 2020.

[10]	DONG C, LOY C C, HE K, et al. Learning a deep convolutional network for image super-resolution [C]// 13th European Conference on Computer Vision . Switzerland: Springer, 2014: 184-199.

[11]	KIM J, LEE J K, LEE K M. Accurate image super-resolution using very deep convolutional networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 1646-1654.

[12]	LI W, ZHOU K, QI L, et al Lapar: linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond[J]. Advances in Neural Information Processing Systems, 2020, 33: 20343- 20355

[13]	LIANG J, CAO J, SUN G, et al. Swinir: image restoration using swin transformer [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 1833-1844.

[14]	CHEN H, WANG Y, GUO T, et al. Pre-trained image processing transformer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville: IEEE, 2021: 12299-12310.

[15]	LEI S, SHI Z, ZOU Z Super-resolution for remote sensing images via local–global combined network[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14 (8): 1243- 1247 doi: 10.1109/LGRS.2017.2704122

[16]	PAN Z, MA W, GUO J, et al Super-resolution of single remote sensing image based on residual dense backprojection networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57 (10): 7918- 7933 doi: 10.1109/TGRS.2019.2917427

[17]	ZHANG D, SHAO J, LI X, et al Remote sensing image super-resolution via mixed high-order attention network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 59 (6): 5183- 5196

[18]	BAI J, YUAN L, XIA S T, et al. Improving vision transformers by revisiting high-frequency components [C]// European Conference on Computer Vision . Cham: Springer, 2022: 1-18.

[19]	ELAD M, FEUER A Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images[J]. IEEE Transactions on Image Processing, 1997, 6 (12): 1646- 1658 doi: 10.1109/83.650118

[20]	LIU C, SUN D On Bayesian adaptive video super resolution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 36 (2): 346- 360

[21]	ZHANG K, LIANG J, VAN GOOL L, et al. Designing a practical degradation model for deep blind image super-resolution [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 4791-4800.

[22]	LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 10012-10022.

[23]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL]. [2024-05-15]. https://arxiv.org/abs/1706.03762.

[24]	ZAMIR S W, ARORA A, KHAN S, et al. Restormer: efficient transformer for high-resolution image restoration [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 5728-5739.

[25]	XIA G, HU J, HU F, et al AID: a benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55 (7): 3965- 3981 doi: 10.1109/TGRS.2017.2685945

[26]	DAI D, YANG W Satellite image classification via two-layer sparse coding with biased image representation[J]. IEEE Geoscience and Remote Sensing Letters, 2010, 8 (1): 173- 176

[27]	TANCHENKO A Visual-PSNR measure of image quality[J]. Journal of Visual Communication and Image Representation, 2014, 25 (5): 874- 878 doi: 10.1016/j.jvcir.2014.01.008

[28]	WANG Z, BOVIK A C, SHEIKH H R, et al Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13 (4): 600- 612 doi: 10.1109/TIP.2003.819861

[29]	ZHANG W, LI X, SHI G, et al. Real-world image super-resolution as multi-task learning [J]. Advances in Neural Information Processing Systems , 2023, 36: 21003-21022.

[30]	WANG X, XIE Liangbin, DONG C, et al. Real-esrgan: training real-world blind super-resolution with pure synthetic data [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 1905-1914.

[31]	MOU C, WU Y, WANG X, et al. Metric learning based interactive modulation for real-world super-resolution [C]// European Conference on Computer Vision . Cham: Springer, 2022: 723-740.

[32]	WEI P, XIE Z, LU H, et al. Component divide-and-conquer for real-world image super-resolution [C]// 16th European Conference on Computer Vision . Glasgow: Springer, 2020: 101-117.

[1]	Zhenli ZHANG,Xinkai HU,Fan LI,Zhicheng FENG,Zhichao CHEN. Semantic segmentation algorithm for multiscale remote sensing images based on CNN and Efficient Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 778-786.

[2]	Hai HUAN,Yu SHENG,Chenxi GU. Global guidance multi-feature fusion network based on remote sensing image road extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 696-707.

[3]	Longxue LIANG,Chenglong HE,Xiaosuo WU,Haowen YAN. Remote sensing image semantic segmentation network based on global information extraction and reconstruction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(11): 2270-2279.

[4]	Zhicheng FENG,Jie YANG,Zhichao CHEN. Urban road network extraction method based on lightweight Transformer[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 40-49.

[5]	Zhao-yang SONG,Xiao-qiang ZHAO,Yong-yong HUI,Hong-mei JIANG. Image super-resolution reconstruction algorithm based on multi-level continuous encoding and decoding[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1885-1893.

[6]	Chun-juan LIU,Ze QIAO,Hao-wen YAN,Xiao-suo WU,Jia-wei WANG,Yu-qiang XIN. Semantic segmentation network for remote sensing image based on multi-scale mutual attention[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1335-1344.

[7]	Xin-dong LV,Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG. Structured image super-resolution network based on improved Transformer[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 865-874.

[8]	Yan ZHAN,Die HU,Hong-tao TANG,Jian-sha LU,Jian TAN,Chang-rui LIU. Image data enhancement method based on improved generative adversarial network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1998-2010.

[9]	Guo-hua ZHOU,Jian-wei LU,Tong-guang NI,Xue-long HU. Hierarchical nonlinear subspace dictionary learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1159-1167.

[10]	Ze-kang WU,Shan ZHAO,Hong-wei LI,Yi-rui JIANG. Spatial global context information network for semantic segmentation of remote sensing image[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 795-802.

[11]	Yun-zuo ZHANG,Wei GUO,Zhao-quan CAI,Wen-bo LI. Remote sensing image target detection combining multi-scale and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2215-2223.

[12]	Dong-jie YANG,Xian-jun GAO,Shu-hao RAN,Guang-bin ZHANG,Ping WANG,Yuan-wei YANG. Building extraction based on multiple multiscale-feature fusion attention network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1924-1934.

[13]	Jing-xin CHANG,Xian-jun GAO,Yuan-wei YANG,Shao-hua LI,Ping WANG. Building boundary optimization method based on object-oriented contour constraint GGVF Snake model[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(10): 1847-1855.

[14]	Shu-hao RAN,Yu-long HU,Yuan-wei YANG,Xian-jun GAO,Xi LI,Ming-zhu CHEN. Building extraction from high resolution remote sensing image based on samples morphological transformation[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(5): 996-1006.

[15]	Ran DUAN,Deng-wen ZHOU,Li-juan ZHAO,Xiao-liang CHAI. Image super-resolution reconstruction based on multi-scale feature mapping network[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(7): 1331-1339.

Viewed

Full text

Abstract

Cited

Shared

Discussed