Multi-scale residual learning combined with Dilformer for dual-stream medical image registration network

doi:10.3785/j.issn.1008-973X.2026.05.017

Journal of ZheJiang University (Engineering Science)

2026, Vol. 60

Issue (5): 1082-1091 DOI: 10.3785/j.issn.1008-973X.2026.05.017

Multi-scale residual learning combined with Dilformer for dual-stream medical image registration network

Jing PENG(

),Jiarong YAN,Jiaying LIU,Ziyi WEI,Shan BAI,Yahong DENG

School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China

Download:

HTML

PDF(6002KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

To address the challenges of low registration accuracy under complex deformations and limited generalization ability in existing medical image registration algorithms, a dual-stream registration network that integrates multi-scale residual learning with a multi-dilated perception Transformer (Dilformer) was proposed. First, a multi-scale residual learning block (MSR) was introduced to enhance feature representation during the dual-stream pyramid feature extraction stage. Then, the Dilformer module was designed to construct a heterogeneous receptive field interaction mechanism using multi-rate dilated convolutions, thereby improving the model’s global modeling capacity at low-resolution scales. Subsequently, a separable residual fusion block (SRF) was developed to effectively fuse multi-scale features and enhance the accuracy of the predicted deformation field. Finally, a multi-resolution loss function was introduced to supervise network training across multiple scales, further improving registration performance. Experimental results on the 3D brain MRI datasets LPBA40 and preprocessed IXI demonstrate that the proposed network achieves superior accuracy compared to state-of-the-art models. Specifically, on the IXI dataset, the proposed network achieves a Dice similarity coefficient of 0.769, a 95th percentile Hausdorff distance of 8.937, a negative Jacobian determinant rate of 0.029, and an inference time of 0.29 s. These results confirm the effectiveness and practical applicability of the proposed network in complex deformation medical image registration tasks.

Key words： image registration dilated convolution Transformer MRI image multi-resolution loss

Received: 03 June 2025 Published: 06 May 2026

CLC:

TP391

Fund: 国家自然科学基金资助项目（62241106，61861025）；智能化隧道监理机器人研究项目（中铁科研院字2020-KJ016-Z016-A2）；甘肃省重点研发计划（甘科计[2024]10号-24YFGA037）；甘肃省科技专员专项（甘科计[2023]18号-23CXGA0008）.

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Jing PENG
	Jiarong YAN
	Jiaying LIU
	Ziyi WEI
	Shan BAI
	Yahong DENG

Cite this article:

Jing PENG,Jiarong YAN,Jiaying LIU,Ziyi WEI,Shan BAI,Yahong DENG. Multi-scale residual learning combined with Dilformer for dual-stream medical image registration network. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 1082-1091.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.05.017 OR https://www.zjujournals.com/eng/Y2026/V60/I5/1082

多尺度残差学习结合Dilformer的双流医学图像配准网络

针对现有医学图像配准算法存在复杂形变配准精度低和模型泛化能力差的问题，提出多尺度残差学习结合多膨胀感知Transformer（Dilformer）的双流医学图像配准网. 提出多尺度残差学习模块（MSR），在双流金字塔特征提取阶段，增强网络特征的表达能力. 设计Dilformer，通过多膨胀率扩张卷积构建异质感受野特征交互机制，增强模型在低尺度空间的全局建模能力. 提出可分离残差融合模块（SRF），融合多尺度特征信息以提升模型预测形变场的准确性. 引入多分辨率损失函数，在不同尺度上约束网络训练，提升配准性能. 实验结果表明，所提网络在3D MRI脑部LPBA40和预处理的IXI数据集上配准精度均优于现有对比模型. 在IXI数据集上，所提网络的戴斯相似系数为0.769，95%分位豪斯多夫距离为8.937，负雅克比行列式比率为0.029，推理时间为0.29 s，证明了该网络在复杂形变医学图像配准中的有效性和实用性.

关键词： 图像配准, 扩张卷积, Transformer, 核磁共振图像, 多分辨率损失

Fig.1 Unsupervised dual-stream medical image registration framework

Fig.2 Multi-scale residual learning combined with Dilformer for dual-stream medical image registration network

Fig.3 Multi-scale residual learning block

Fig.4 Multi-dilated perception Transformer module

Fig.5 2D slice images from IXI and LPBA40 datasets

Fig.6 IXI dataset preprocessing procedure

Tab.1 Quantitative analysis results of each medical image registration model on IXI dataset

Fig.7 Qualitative analysis results of each medical image registration model on IXI dataset

Tab.2 Dilation rate analysis results on IXI dataset

Fig.8 Module ablation study results on IXI dataset

Tab.3 Comparison of performance metrics for module ablation study on IXI dataset

Fig.9 Model generalization performance experimental results on LPBA40 dataset

Tab.4 Generalization validation data of model on LPBA40 dataset


[1]	CHEN J, LIU Y, WEI S, et al A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond[J]. Medical Image Analysis, 2025, 100: 103385 doi: 10.1016/j.media.2024.103385

[2]	沈瑜, 魏子易, 严源, 等基于多尺度约束的大形变3D医学图像配准[J]. 中国激光, 2024, 51 (21): 2107109 SHEN Yu, WEI Ziyi, YAN Yuan, et al Large-deformation 3D medical image registration based on multi-scale constraints[J]. Chinese Journal of Lasers, 2024, 51 (21): 2107109 doi: 10.3788/CJL241180

[3]	AVANTS B B, TUSTISON N J, SONG G, et al A reproducible evaluation of ANTs similarity metric performance in brain image registration[J]. NeuroImage, 2011, 54 (3): 2033- 2044 doi: 10.1016/j.neuroimage.2010.09.025

[4]	HERNANDEZ M, RAMON JULVEZ U Insights into traditional large deformation diffeomorphic metric mapping and unsupervised deep-learning for diffeomorphic registration and their evaluation[J]. Computers in Biology and Medicine, 2024, 178: 108761 doi: 10.1016/j.compbiomed.2024.108761

[5]	李文举, 孔德卿, 曹国刚, 等基于训练-推理解耦架构的2D-3D医学图像配准[J]. 激光与光电子学进展, 2022, 59 (16): 1610015 LI Wenju, KONG Deqing, CAO Guogang, et al 2D-3D medical image registration based on training-inference decoupling architecture[J]. Laser and Optoelectronics Progress, 2022, 59 (16): 1610015 doi: 10.3788/LOP202259.1610015

[6]	林立昊, 易见兵, 曹锋, 等多尺度并行全卷积神经网络的肺计算机断层扫描图像非刚性配准算法[J]. 激光与光电子学进展, 2022, 59 (16): 1617004 LIN Lihao, YI Jianbing, CAO Feng, et al Non-rigid registration algorithm of lung computed tomography image based on multi-scale parallel fully convolutional neural network[J]. Laser and Optoelectronics Progress, 2022, 59 (16): 1617004 doi: 10.3788/LOP202259.1617004

[7]	BALAKRISHNAN G, ZHAO A, SABUNCU M R, et al VoxelMorph: a learning framework for deformable medical image registration[J]. IEEE Transactions on Medical Imaging, 2019, 38 (8): 1788- 1800 doi: 10.1109/TMI.2019.2897538

[8]	尹艺晓, 马金刚, 张文凯, 等从U-Net到Transformer: 混合模型在医学图像分割中的应用进展[J]. 激光与光电子学进展, 2025, 62 (2): 1- 23 YIN Yixiao, MA Jingang, ZHANG Wenkai, et al From U-Net to transformer: progress in the application of hybrid models in medical image segmentation[J]. Laser and Optoelectronics Progress, 2025, 62 (2): 1- 23 doi: 10.3788/LOP240875

[9]	JADERBERG M, SIMONYAN K, ZISSERMAN A. Spatial transformer networks [C]// Proceedings of the 29th International Conference on Neural Information Processing Systems. [S.l.]: MIT Press, 2015: 2017–2025.

[10]	JIA X, BARTLETT J, ZHANG T, et al. U-Net vs Transformer: is U-Net outdated inMedical image registration? [C]// Machine Learning in Medical Imaging. [S.l.]: Springer, 2022: 151–160.

[11]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. [S.l.]: Curran Associates Inc. , 2017: 5998–6008.

[12]	石磊, 籍庆余, 陈清威, 等视觉Transformer在医学图像分析中的应用研究综述[J]. 计算机工程与应用, 2023, 59 (8): 41- 55 SHI Lei, JI Qingyu, CHEN Qingwei, et al Review of research on application of vision transformer in medical image analysis[J]. Computer Engineering and Applications, 2023, 59 (8): 41- 55 doi: 10.3778/j.issn.1002-8331.2206-0022

[13]	QIU W, XIONG L, LI N, et al UTR: a UNet-like transformer for efficient unsupervised medical image registration[J]. Image and Vision Computing, 2024, 150: 105209 doi: 10.1016/j.imavis.2024.105209

[14]	MA T, DAI X, ZHANG S, et al. PIViT: large deformation image registration with Pyramid-iterative vision transformer [C]// Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. [S.l.]: Springer, 2023: 602–612.

[15]	LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2022: 9992–10002.

[16]	WANG H, NI D, WANG Y Recursive deformable pyramid network for unsupervised medical image registration[J]. IEEE Transactions on Medical Imaging, 2024, 43 (6): 2229- 2240 doi: 10.1109/TMI.2024.3362968

[17]	NAN J, FAN G, ZHANG K, et al. MsMorph: an unsupervised pyramid learning network for brain image registration [EB/OL]. (2024–10–23)[2025–05–29]. https://arxiv.org/abs/2410.18228.

[18]	刘卫朋, 李旭, 任子文, 等多尺度残差可变形肺部CT图像配准算法[J]. 华南理工大学学报: 自然科学版, 2024, 52 (10): 135- 145 LIU Weipeng, LI Xu, REN Ziwen, et al Algorithm for multiscale residual deformable lung CT image registration[J]. Journal of South China University of Technology: Natural Science Edition, 2024, 52 (10): 135- 145 doi: 10.12141/j.issn.1000-565X.230726

[19]	YANG H, YUAN C, LI B, et al Asymmetric 3D convolutional neural networks for action recognition[J]. Pattern Recognition, 2019, 85: 1- 12 doi: 10.1109/icip.2019.8802910

[20]	MA Y, NIU D, ZHANG J, et al Unsupervised deformable image registration network for 3D medical images[J]. Applied Intelligence, 2022, 52 (1): 766- 779 doi: 10.1007/s10489-021-02196-7

[21]	CHEN J, FREY E C, HE Y, et al TransMorph: transformer for unsupervised medical image registration[J]. Medical Image Analysis, 2022, 82: 102615 doi: 10.1016/j.media.2022.102615

[22]	FISCHL B FreeSurfer[J]. NeuroImage, 2012, 62 (2): 774- 781 doi: 10.1016/j.neuroimage.2012.01.021

[23]	KIM B, KIM D H, PARK S H, et al CycleMorph: cycle consistent unsupervised deformable image registration[J]. Medical Image Analysis, 2021, 71: 102036 doi: 10.1016/j.media.2021.102036

[24]	CHEN J, HE Y, FREY E C, et al. ViT-V-Net: vision transformer for unsupervised volumetric medical image registration [EB/OL]. (2021–04–13)[2025–05–29]. https://arxiv.org/abs/2104.06468.

[25]	CHEN Z, ZHENG Y, GEE J C TransMatch: a transformer-based multilevel dual-stream feature matching network for unsupervised deformable image registration[J]. IEEE Transactions on Medical Imaging, 2024, 43 (1): 15- 27 doi: 10.1109/TMI.2023.3288136

[1]	Wenyuan BIAN,Jiuyuan HUO,Chen CHANG. Wind power data cleaning method based on improved imputation diffusion model and LSTM[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(5): 1016-1026.

[2]	Yuzhen HOU,Xiaohong SHEN,Li LI,Mingyuan YANG,Caiming ZHANG. Dual-stage deraining network based on mask and non-local attention[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 791-799.

[3]	Gang WAN,Xiaobo WANG,Gang SHI,Dezhen YE,Sisi ZHU,Fan SI. Underwater image enhancement algorithm based on feature refinement and attention-augmented reconstruction[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 800-811.

[4]	Xiao’an BAO,Shuyou PENG,Na ZHANG,Xiaomei TU,Qingqi ZHANG,Biao WU. Object detection algorithm based on multi-azimuth perception deep fusion detection head[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 32-42.

[5]	Xuan MENG,Xueying ZHANG,Ying SUN,Yaru ZHOU. EEG emotion recognition based on electrode arrangement and Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(9): 1872-1880.

[6]	Jie LIU,You WU,Jiahe TIAN,Ke HAN. Based on improved Transformer for super-resolution reconstruction of lung CT images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1434-1442.

[7]	Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN. Visual induced motion sickness estimation model based on attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1110-1118.

[8]	Jian XIAO,Liangliang WU,Xinze HE,Xin HU. Image feature matching algorithm based on anomaly detection[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1140-1147.

[9]	Mengyao ZHANG,Jie ZHOU,Wenting LI,Yong ZHAO. Three-dimensional mesh segmentation framework using global and local information[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 912-919.

[10]	Dejun ZHANG,Yanzi BAI,Feng CAO,Yiqi WU,Zhanya XU. Point cloud Transformer adapter for dense prediction task[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 920-928.

[11]	Li MA,Yongshun WANG,Yao HU,Lei FAN. Pre-trained long-short spatiotemporal interleaved Transformer for traffic flow prediction applications[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 669-678.

[12]	Zhenli ZHANG,Xinkai HU,Fan LI,Zhicheng FENG,Zhichao CHEN. Semantic segmentation algorithm for multiscale remote sensing images based on CNN and Efficient Transformer[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 778-786.

[13]	Xiaofen JIA,Zixiang WANG,Baiting ZHAO,Zhenhuan LIANG,Rui HU. Image super-resolution reconstruction method driven by two-dimensional cross-fusion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2516-2526.

[14]	Yan YANG,Cunpeng JIA. An efficient image dehazing algorithm with Agent Attention for domain feature interaction[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2527-2538.

[15]	Yuxuan LIU,Yizhi LIU,Zhuhua LIAO,Zhengbiao ZOU,Jingxin TANG. Adaptive graph attention Transformer for dynamic traffic flow prediction[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(12): 2585-2592.

Viewed

Full text

Abstract

Cited

Shared

Discussed