Classification network for chest disease based on convolution-assisted self-attention

doi:10.3785/j.issn.1008-973X.2025.05.002

Journal of ZheJiang University (Engineering Science)

2025, Vol. 59

Issue (5): 890-901 DOI: 10.3785/j.issn.1008-973X.2025.05.002

Classification network for chest disease based on convolution-assisted self-attention

Ziran ZHANG(

),Qiang LI,Xin GUAN*(

)

School of Microelectronics, Tianjin University, Tianjin 300072, China

Download:

HTML

PDF(2930KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A chest disease classification network based on convolution-assisted window self-attention was proposed, called CAWSNet, aiming at the issues of varying lesion sizes, complex textures, and mutual interference in chest X-ray images. The Swin Transformer was utilized as the backbone, employing window self-attention to model long-range visual dependencies. Convolution was introduced to enhance local feature extraction capability while compensating for the deficiencies of window self-attention. Image relative position encoding was used to dynamically calculate directed relative positions, helping the network better model pixel-wise spatial relationships. Class-specific residual attention was employed, and the classifier’s focus area was adjusted based on disease categories in order to highlight effective information and enhance multi-label classification capability. Dynamic difficulty loss function was proposed to alleviate the problem of large differences in disease classification difficulty and the imbalance of positive and negative samples in the dataset. The experimental results on the public datasets ChestX-Ray14, CheXpert and MIMIC-CXR-JPG demonstrate that proposed CAWSNet achieves AUC scores of 0.853, 0.898 and 0.819, respectively, confirming the effectiveness and robustness of the network in diagnosing chest diseases through X-ray images.

Key words： chest X-ray image classification window self-attention convolution image relative position encoding dynamic difficulty loss function

Received: 01 March 2024 Published: 25 April 2025

CLC:

TP 391

Fund: 国家自然科学基金资助项目（62071323）；超声医学工程国家重点实验室开放课题资助项目（2022KFKT004）；天津市自然科学基金资助项目（22JCZDJC00220）.

Corresponding Authors: Xin GUAN E-mail: 260077200@qq.com;guanxin@tju.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Ziran ZHANG
	Qiang LI
	Xin GUAN

Cite this article:

Ziran ZHANG,Qiang LI,Xin GUAN. Classification network for chest disease based on convolution-assisted self-attention. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 890-901.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.05.002 OR https://www.zjujournals.com/eng/Y2025/V59/I5/890

基于卷积辅助自注意力的胸部疾病分类网络

针对胸部X光影像中的病变大小不一，纹理复杂，且存在相互影响等问题，提出基于卷积辅助窗口自注意力的胸部X光影像疾病分类网络CAWSNet. 使用Swin Transformer作为骨干网络，以窗口自注意力建模长距离视觉依赖关系，通过引入卷积辅助，在弥补其缺陷的同时，强化局部特征提取能力. 引入图像相对位置编码，通过有向相对位置的动态计算，帮助网络更好地建模像素间的位置关系. 使用类别残差注意力，根据疾病类别来调整分类器关注的区域，突出有效信息，提高多标签分类能力. 提出动态难度损失函数，解决不同疾病分类的难度差异大，数据集中正负样本不平衡的问题. 在公开数据集ChestX-Ray14、CheXpert和MIMIC-CXR-JPG上的实验结果表明，提出CAWSNet的AUC分数分别达到0.853、0.898和0.819，表明该网络在胸部X光影像疾病诊断中的有效性和鲁棒性.

关键词： 胸部X光图像分类, 窗口自注意力, 卷积, 图像相对位置编码, 动态难度损失函数

Fig.1 Overall architecture of CAWSNet

Fig.2 A group of CAWS Transformer block

Fig.3 Window shift method

Fig.4 Implementation method of CAWS module

Fig.5 Class-specific residual attention classifier

Tab.2 Comparison of result of different chest disease classification network on CheXpert validation set

Fig.6 ROC curves and AUC values of chest diseases on ChestX-ray14 test set

Fig.7 ROC curve and AUC value of chest disease on CheXpert validation set

Fig.8 ROC curve and AUC value of chest disease on MIMIC-CXR test set

Tab.1 Comparison of result of different chest disease classification network on ChestX-Ray14 test set

Tab.3 Comparison of result of different chest disease classification network on MIMIC-CXR test set

Tab.4 Effect of position encoding application method on network classification performance

Tab.5 Effect of channel reinforcement weighted position on network classification performance

Tab.6 Effect of different module on network classification performance

Tab.7 Effect of different loss function on network classification performance

Tab.8 Result of Stratified KFold cross-validation

Tab.9 Comparison of computational complexity of different chest disease classification network

Fig.9 Doctor's marked lesion area (left) and Grad-CAM heat map (right)

Fig.10 Example of disease prediction score in CXR image


[1]	JACOBI A, CHUNG M, BERNHEIM A, et al. Portable chest X-ray in coronavirus disease-19 (COVID-19): a pictorial review[J]. Clinical Imaging, 2020, 64 (8): 35- 42

[2]	HEIDARI A, NAVIMIPOUR N J, UNAL M, et al. The COVID-19 epidemic analysis and diagnosis using deep learning: a asystematic literature review and future directions[J]. Computers in Biology and Medicine, 2022, 141: 105141 doi: 10.1016/j.compbiomed.2021.105141

[3]	郑光远, 刘峡壁, 韩光辉医学影像计算机辅助检测与诊断系统综述[J]. 软件学报, 2018, 29 (5): 1471- 1514 ZHENG Guangyuan, LIU Xiabi, HAN Guanghui Survey on medical image computer aided detection and diagnosis systems[J]. Journal of Software, 2018, 29 (5): 1471- 1514

[4]	CHEN J T, YU H Y, FENG R W, et al. Flow-Mixup: classifying multi-labeled medical images with corrupted labels [C]// IEEE International Conference on Bioinformatics and Biomedicine. Seoul: IEEE, 2020: 534-541.

[5]	ANWAR S M, MAJID M, QAYYUM A, et al. Medical image analysis using convolutional neural networks: a review[J]. Journal of Medical Systems, 2018, 42 (11): 226 doi: 10.1007/s10916-018-1088-1

[6]	YI X, WALIA E, BABYN P Generative adversarial network in medical imaging: a review[J]. Medical Image Analysis, 2019, 58: 101552 doi: 10.1016/j.media.2019.101552

[7]	ZHOU S K, LE H N, LUU K, et al. Deep reinforcement learning in medical imaging: a literature review[J]. Medical Image Analysis, 2021, 73: 102193 doi: 10.1016/j.media.2021.102193

[8]	LI Q, LAI Y, ADAMU MJ Multi-level residual feature fusion network for thoracic disease classification in chest x-ray images[J]. IEEE Access, 2023, 11 (11): 40988- 41002

[9]	胡锦波, 聂为之, 宋丹, 等可形变Transformer辅助的胸部X光影像疾病诊断模型[J]. 浙江大学学报: 工学版, 2023, 57 (10): 1923- 1932 HU Jinbo, NIE Weizhi, SONG Dan, et al. Chest X-ray imaging disease diagnosis model assisted by deformable Transformer[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (10): 1923- 1932

[10]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . [S. l.]: Curran Associates, 2017: 6000-6010.

[11]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. (2021-06-03) [2023-08-05]. https://arxiv.org/pdf/2010.11929.pdf.

[12]	LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992-10002.

[13]	WU K, PENG H W, CHEN M H, et al. Rethinking and improving relative position encoding for vision [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10013-10021.

[14]	ZHU K, WU J K. Residual attention: a simple but effective method for multi-label recognition [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 184-193.

[15]	WANG X S, PENG Y F, LU L, et al. ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases [C]// 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3462-3471.

[16]	IRVIN J, RAJPURKAR P, KO M, et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison [C]// 33rd AAAI Conference on Artificial Intelligence. Honolulu: AAAI, 2019: 590-597.

[17]	JOHNSON A E W, POLLARD T J, BERKOWITZ S J, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports[J]. Scientific Data, 2019, 6 (1): 317 doi: 10.1038/s41597-019-0322-0

[18]	CHEN B Z, LI J X, GUO X B, et al. DualCheXNet: dual asymmetric feature learning for thoracic disease classification in chest X-rays[J]. Biomedical Signal Processing and Control, 2019, 53: 101554 doi: 10.1016/j.bspc.2019.04.031

[19]	WANG H Y, WANG S S, QIN Z B, et al. Triple attention learning for classification of 14 thoracic diseases using chest radiography[J]. Medical Image Analysis, 2021, 67 (1): 8415- 8423

[20]	CHEN K, WANG X Q, ZHANG S W Thorax disease classification based on pyramidal convolution shuffle attention neural network[J]. IEEE Access, 2022, 10: 85571- 85581 doi: 10.1109/ACCESS.2022.3198958

[21]	CHEN B Z, ZHANG Z, LI Y Z, et al. Multi-label chest X-ray image classification via semantic similarity graph embedding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (4): 2455- 2468 doi: 10.1109/TCSVT.2021.3079900

[22]	JIANG X B, ZHU Y, GAI G, et al. MXT: a new variant of pyramid vision Transformer for multi-label chest X-ray image classification[J]. Cognitive Computation, 2022, 14 (4): 1362- 1377 doi: 10.1007/s12559-022-10032-4

[23]	PAN X R, GE C J, LU R, et al. On the integration of self-attention and convolution [C]// IEEE Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 805-815.

[24]	HU J, SHEN L, SU G. Squeeze-and-Excitation networks [C]// 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.

[25]	LIN T, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (2): 318- 327 doi: 10.1109/TPAMI.2018.2858826

[26]	PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library [C]// 33rd Conference on Neural Information Processing Systems . Vancouver: [s. n. ], 2019: 32.

[27]	PHAM H H, LE T T, TRAN D Q, et al. Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels[J]. Neurocomputing, 2021, 437: 186- 194 doi: 10.1016/j.neucom.2020.03.127

[28]	JIANG X B, ZHU Y, LIU Y T, et al TransDD: a transformer-based dual-path decoder for improving the performance of thoracic diseases classification using chest X-ray[J]. Biomedical Signal Processing and Control, 2024, 91: 13

[29]	ZHU X F, PANG S M, ZHANG X X, et al. PCAN: pixel-wise classification and attention network for thoracic disease classification and weakly supervised localization[J]. Computerized Medical Imaging and Graphics, 2022, 102: 102137 doi: 10.1016/j.compmedimag.2022.102137

[30]	SUN Z X, QU L H, LUO J Z, et al Label correlation transformer for automated chest X-ray diagnosis with reliable interpretability[J]. Radiologia Medica, 2023, 128 (6): 726- 733 doi: 10.1007/s11547-023-01647-0

[31]	LEE Y W, HUANG S K, CHANG R F CheXGAT: a disease correlation-aware network for thorax disease diagnosis from chest X-ray images[J]. Artificial Intelligence in Medicine, 2022, 132: 102382 doi: 10.1016/j.artmed.2022.102382

[32]	XIAO J F, BAI Y T, YUILLE A, et al. Delving into masked autoencoders for multi-label thorax disease classification [C]// IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 3577-3589.

[33]	LIU Z, CHENG Y Z, TAMURA S Multi-label local to global learning: a novel learning paradigm for chest x-ray abnormality classification[J]. IEEE Journal of Biomedical and Health Informatics, 2023, 27 (9): 4409- 4420 doi: 10.1109/JBHI.2023.3281466

[34]	ZHU X F, FENG Q. MVC-NET: multi-view chest radiograph classification network with deep fusion [C]// 18th IEEE International Symposium on Biomedical Imaging. Nice: IEEE, 2021: 554-558.

[35]	JACENKOW G, O'NEIL A Q, TSAFTARIS S A. Indication as prior knowledge for multimodal disease classification in chest radiographs with transformers [C]// IEEE International Symposium on Biomedical Imaging. Kolkata: IEEE, 2022.

[36]	SEIBOLD C, REISS S, SARFRAZ M S, et al. Breaking with fixed set pathology recognition through report-guided contrastive training [C]// Medical Image Computing and Computer Assisted Intervention. Singapore: Springer, 2022, 13435: 690-700.

[37]	CHEN B, LI J, LU G, et al Label co-occurrence learning with graph convolutional networks for multi-label chest x-ray image classification[J]. IEEE Journal of Biomedical and Health Informatics, 2020, 24 (8): 2292- 2302 doi: 10.1109/JBHI.2020.2967084

[1]	Zongmin LI,Chang XU,Yun BAI,Shiyang XIAN,Guangcai RONG. Dual-neighborhood graph convolution method for point cloud understanding[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 879-889.

[2]	Xinyu QIAN,Qinglin XIE,Gongquan TAO,Zefeng WEN. Quantitative identification method of wheel flats based on multi-structured data-driven[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 688-697.

[3]	Jinhui YANG,Xianjun GAO,Yuan KOU,Shengyan YU,Lei XU,Yuanwei YANG. Dual-activation gated convolution with SAR fusion for thick cloud removal from optical remote sensing images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 804-813.

[4]	Minghui YAO,Yueyan WANG,Qiliang WU,Yan NIU,Cong WANG. Siamese networks algorithm based on small human motion behavior recognition[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 504-511.

[5]	Bote WANG,Qing WANG,Qiang LIU,Bo JIN. Self-supervised anomaly recognition method for wind turbine blade based on multi-channel vibration principal features[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 653-660.

[6]	Jiayi YU,Qin WU. Monocular 3D object detection based on context information enhancement and depth guidance[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 89-99.

[7]	Lingxuan LAI,Jingqing LIU,Yisu ZHOU,Xiujuan LI. Identification of leakage in water supply pipelines based on time-frequency convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 196-204.

[8]	Qingdong RAN,Lixin ZHENG. Defect detection method of lithium battery electrode based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1811-1821.

[9]	Haijun WANG,Tao WANG,Cijun YU. CFRP ultrasonic detection defect identification method based on recursive quantitative analysis[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1604-1617.

[10]	Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU. Attention-fused filter bank dual-view graph convolution motor imagery EEG classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1326-1335.

[11]	Jinye LI,Yongqiang LI. Spatial-temporal multi-graph convolution for traffic flow prediction by integrating knowledge graphs[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1366-1376.

[12]	Jun YANG,Chen ZHANG. Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1121-1132.

[13]	Kang HAN,Hongfei ZHAN,Junhe YU,Rui WANG. Rolling bearing fault diagnosis based on dilated convolution and enhanced multi-scale feature adaptive fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1285-1295.

[14]	Zhiwei XING,Shujie ZHU,Biao LI. Airline baggage feature perception based on improved graph convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 941-950.

[15]	Yongxi HE,Hu HAN,Bo KONG. Aspect-based sentiment analysis model based on multi-dependency graph and knowledge fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 737-747.

Viewed

Full text

Abstract

Cited

Shared

Discussed