Dual-stage deraining network based on mask and non-local attention

doi:10.3785/j.issn.1008-973X.2026.04.011

Journal of ZheJiang University (Engineering Science)

2026, Vol. 60

Issue (4): 791-799 DOI: 10.3785/j.issn.1008-973X.2026.04.011

Dual-stage deraining network based on mask and non-local attention

Yuzhen HOU1(

),Xiaohong SHEN1,*(

),Li LI1,Mingyuan YANG1,Caiming ZHANG2,3

1. School of Computing and Artificial Intelligence, Shandong University of Finance and Economics, Jinan 250014, China
2. School of Software, Shandong University, Jinan 250101, China
3. Shandong Future Intelligent Finance Engineering Laboratory, Shandong Technology and Business University, Yantai 264003, China

Download:

HTML

PDF(4767KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A dual-stage image deraining network based on rain streak mask suppression and non-local reconstruction collaboration was proposed to address severe rain streak noise interference and insufficient spatial global modeling capability of existing attention mechanisms in single-image deraining networks. In the first stage of the network, a rain streak mask attention mechanism was designed, in which rain streak masks were generated through morphological operations, to enhance the model’s ability to suppress rain streak interference by selectively masking rain-affected regions during feature extraction. In the second stage, a non-local attention mechanism was devised by employing a feature clustering-based non-local similarity measurement method to guide pixel rearrangement, which broke spatial constraints, thereby augmenting the long-range modeling capability of the sliding window attention mechanism and improving the deraining performance. Through progressive optimization based on the dual-stage “rain streak suppression-detail reconstruction” process, high-quality reconstruction of rain-free images was achieved. Experimental results on multiple public datasets demonstrate that the proposed network achieves significant improvements in both PSNR and SSIM metrics compared to other networks, effectively removing rain streaks while better preserving image details and producing high-quality restored results with natural-looking appearance and fine-grained texture representations.

Key words： image deraining Transformer rain streak mask non-local attention feature clustering

Received: 14 July 2025 Published: 19 March 2026

CLC:

TP 391.4

Fund: 国家自然科学基金资助项目(62202268); 中央引导地方科技发展资金资助项目(YDZX2023079); 教育部人文社科资助项目(22YJA630086); 山东省重点研发计划资助项目(2024TSGC0118).

Corresponding Authors: Xiaohong SHEN E-mail: houyuzhen921@163.com;xhshen@sdufe.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Yuzhen HOU
	Xiaohong SHEN
	Li LI
	Mingyuan YANG
	Caiming ZHANG

Cite this article:

Yuzhen HOU,Xiaohong SHEN,Li LI,Mingyuan YANG,Caiming ZHANG. Dual-stage deraining network based on mask and non-local attention. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 791-799.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.04.011 OR https://www.zjujournals.com/eng/Y2026/V60/I4/791

基于掩模和非局部注意力的双阶段去雨网络

针对单图像去雨网络中雨纹噪声干扰严重与现有注意力机制空间全局建模能力不足的问题，提出基于雨纹掩模抑制和非局部重建协同的双阶段图像去雨网络. 第1阶段构建雨纹掩模注意力机制，通过形态学操作生成雨纹掩模，在特征提取时选择性遮蔽有雨区域，提高模型抑制雨纹干扰的能力；第2阶段设计非局部注意力机制，利用基于特征聚类的非局部相似性度量方法引导像素重排，打破空间约束，增强滑动窗口注意力的远距离建模能力，提升去雨效果. 双阶段设计采用“雨纹抑制-细节重建”的递进优化，实现无雨图像的高质量重建. 在多个公开数据集上的实验表明，与其他网络相比，所提网络的峰值信噪比与结构相似性指标显著提升，在有效去除雨纹的同时更好地保留了图像细节信息，能获得视觉效果更自然、细节纹理更丰富的高质量复原图像.

关键词： 图像去雨, Transformer, 雨纹掩模, 非局部注意力, 特征聚类

Fig.1 Framework of dual-stage deraining network based on mask and non-local attention

Fig.2 Rain streak mask generation block

Fig.3 Structure of masked attention block（MAB）

Fig.4 Structure of non-local attention block（NAB）

Tab.1 Comparison of objective evaluations of different methods

Fig.5 Comparison of deraining effect of different algorithms on DID-Data

Fig.6 Comparison of deraining effect of different algorithms on Rain200H

Tab.2 Results of ablation experiments on different modules of proposed method

Fig.7 Comparison of deraining effect of ablation experiments


[2]	汤红忠, 王翔, 张小刚, 等面向单幅图像去雨的非相干字典学习及其稀疏表示研究[J]. 通信学报, 2017, 38 (7): 28- 35 TANG Hongzhong, WANG Xiang, ZHANG Xiaogang, et al Incoherent dictionary learning and sparse representation for single-image rain removal[J]. Journal on Communications, 2017, 38 (7): 28- 35 doi: 10.11959/j.issn.1000-436x.2017149

[3]	YANG W, TAN R T, WANG S, et al Single image deraining: from model-based to data-driven and beyond[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (11): 4059- 4077 doi: 10.1109/TPAMI.2020.2995190

[4]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers [C]// The 16th European Conference on Computer Vision. Cham: Springer, 2020: 213–229.

[5]	HONG D, HAN Z, YAO J, et al SpectralFormer: rethinking hyperspectral image classification with transformers[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1- 15

[6]	杨军, 张琛基于边界点估计与稀疏卷积神经网络的三维点云语义分割[J]. 浙江大学学报: 工学版, 2024, 58 (6): 1121- 1132 YANG Jun, ZHANG Chen Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network[J]. Journal of Zhejiang University: Engineering Science, 2024, 58 (6): 1121- 1132

[7]	LI B, ZHANG Z, ZHENG H, et al. Diving deep into regions: exploiting regional information Transformer for single image deraining [EB/OL]. (2024-08-04) [2025 -07-01]. https://arxiv.org/abs/2402.16033.

[8]	ZENG K, LIN H, YAN Z, et al Non-local self-attention network for image super-resolution[J]. Applied Intelligence, 2024, 54 (7): 5336- 5352 doi: 10.1007/s10489-024-05343-y

[9]	ZHENG X, LIAO Y, GUO W, et al. Single-image-based rain and snow removal using multi-guided filter [C]// International Conference Neural Information Processing. Berlin, Heidelberg: Springer, 2013.

[10]	KIM J H, LEE C, SIM J Y, et al. Single-image deraining using an adaptive nonlocal means filter [C]// 2013 IEEE International Conference on Image Processing. Melbourne: IEEE, 2013: 914–917.

[11]	CHEN D Y, CHEN C C, KANG L W Visual depth guided color image rain streaks removal using sparse coding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24 (8): 1430- 1455 doi: 10.1109/TCSVT.2014.2308627

[1]	LIN C Y, TAO Z, XU A S, et al Sequential dual attention network for rain streak removal in a single image[J]. IEEE Transactions on Image Processing, 2020, 29: 6250- 6265

[12]	LUO Y, XU Y, JI H. Removing rain from a single image via discriminative sparse coding [C]// 2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015: 3397–3405.

[13]	LI Y, TAN R T, GUO X, et al. Rain streak removal using layer priors [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 2736–2744.

[14]	FU X, HUANG J, DING X, et al Clearing the skies: a deep network architecture for single-image rain removal[J]. IEEE Transactions on Image Processing, 2017, 26 (6): 2944- 2956 doi: 10.1109/TIP.2017.2691802

[15]	FU X, HUANG J, ZENG D, et al. Removing rain from single images via a deep detail network [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 1715–1723.

[16]	REN D, ZUO W, HU Q, et al. Progressive image deraining networks: a better and simpler baseline [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 3932-3941.

[17]	YANG W, TAN R T, FENG J, et al. Deep joint rain detection and removal from a single image [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 1685–1694.

[18]	WANG H, XIE Q, ZHAO Q, et al. A model-driven deep neural network for single image rain removal [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020: 3100–3109.

[19]	FU X, QI Q, ZHA Z, et al. Rain streak removal via dual graph convolutional network [C]// Proceedings of the AAAI Conference on Artificial Intelligence. [S. l. ]: AAAI Press, 2021: 1352–1360.

[20]	YI Q, LI J, DAI Q, et al. Structure-preserving deraining with residue channel prior guidance [C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021: 4218–4227.

[21]	DOAOVUTSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale [EB/OL]. (2021-06-03) [2025-07-01]. https://arxiv.org/abs/2010.11929.

[22]	LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows [C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021: 9992–10002.

[23]	ZAMIR S W, ARORA A, KHAN S, et al. Restormer: efficient transformer for high-resolution image restoration [C]// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans: IEEE, 2022: 5718–5729.

[24]	CHEN X, LI H, LI M, et al. Learning a sparse transformer network for effective image deraining [C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver: IEEE, 2023: 5896–5905.

[25]	CHEN X, PAN J, DONG J. Bidirectional multi-scale implicit neural representations for image deraining [C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2024: 25627–25636.

[26]	OUYANG Z, LI W MMamba: enhancing image deraining with Morton curve-driven locality learning[J]. Neurocomputing, 2025, 638: 130161 doi: 10.1016/j.neucom.2025.130161

[27]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 770–778.

[28]	SHI W, CABALLERO J, HUSZÁR F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 1874–1883.

[1]	Xiao’an BAO,Shuyou PENG,Na ZHANG,Xiaomei TU,Qingqi ZHANG,Biao WU. Object detection algorithm based on multi-azimuth perception deep fusion detection head[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 32-42.

[2]	Xuan MENG,Xueying ZHANG,Ying SUN,Yaru ZHOU. EEG emotion recognition based on electrode arrangement and Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1872-1880.

[3]	Jie LIU,You WU,Jiahe TIAN,Ke HAN. Based on improved Transformer for super-resolution reconstruction of lung CT images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1434-1442.

[4]	Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN. Visual induced motion sickness estimation model based on attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1110-1118.

[5]	Mengyao ZHANG,Jie ZHOU,Wenting LI,Yong ZHAO. Three-dimensional mesh segmentation framework using global and local information[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 912-919.

[6]	Dejun ZHANG,Yanzi BAI,Feng CAO,Yiqi WU,Zhanya XU. Point cloud Transformer adapter for dense prediction task[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 920-928.

[7]	Li MA,Yongshun WANG,Yao HU,Lei FAN. Pre-trained long-short spatiotemporal interleaved Transformer for traffic flow prediction applications[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 669-678.

[8]	Zhenli ZHANG,Xinkai HU,Fan LI,Zhicheng FENG,Zhichao CHEN. Semantic segmentation algorithm for multiscale remote sensing images based on CNN and Efficient Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 778-786.

[9]	Xiaofen JIA,Zixiang WANG,Baiting ZHAO,Zhenhuan LIANG,Rui HU. Image super-resolution reconstruction method driven by two-dimensional cross-fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2516-2526.

[10]	Yan YANG,Cunpeng JIA. An efficient image dehazing algorithm with Agent Attention for domain feature interaction[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2527-2538.

[11]	Yuxuan LIU,Yizhi LIU,Zhuhua LIAO,Zhengbiao ZOU,Jingxin TANG. Adaptive graph attention Transformer for dynamic traffic flow prediction[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2585-2592.

[12]	Bing YANG,Chuyang XU,Jinliang YAO,Xueqin XIANG. 3D hand pose estimation method based on monocular RGB images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 18-26.

[13]	Xianwei MA,Chaohui FAN,Weizhi NIE,Dong LI,Yiqun ZHU. Robust fault diagnosis method for failure sensors[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1488-1497.

[14]	Kang FAN,Ming’en ZHONG,Jiawei TAN,Zehui ZHAN,Yan FENG. Traffic scene perception algorithm with joint semantic segmentation and depth estimation[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 684-695.

[15]	Shaojie WEN,Ruigang WU,Chaowen FENG,Yingli LIU. Multimodal cascaded document layout analysis network based on Transformer[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 317-324.

Viewed

Full text

Abstract

Cited

Shared

Discussed