Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (5): 890-901    DOI: 10.3785/j.issn.1008-973X.2025.05.002
    
Classification network for chest disease based on convolution-assisted self-attention
Ziran ZHANG(),Qiang LI,Xin GUAN*()
School of Microelectronics, Tianjin University, Tianjin 300072, China
Download: HTML     PDF(2930KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A chest disease classification network based on convolution-assisted window self-attention was proposed, called CAWSNet, aiming at the issues of varying lesion sizes, complex textures, and mutual interference in chest X-ray images. The Swin Transformer was utilized as the backbone, employing window self-attention to model long-range visual dependencies. Convolution was introduced to enhance local feature extraction capability while compensating for the deficiencies of window self-attention. Image relative position encoding was used to dynamically calculate directed relative positions, helping the network better model pixel-wise spatial relationships. Class-specific residual attention was employed, and the classifier’s focus area was adjusted based on disease categories in order to highlight effective information and enhance multi-label classification capability. Dynamic difficulty loss function was proposed to alleviate the problem of large differences in disease classification difficulty and the imbalance of positive and negative samples in the dataset. The experimental results on the public datasets ChestX-Ray14, CheXpert and MIMIC-CXR-JPG demonstrate that proposed CAWSNet achieves AUC scores of 0.853, 0.898 and 0.819, respectively, confirming the effectiveness and robustness of the network in diagnosing chest diseases through X-ray images.



Key wordschest X-ray image classification      window self-attention      convolution      image relative position encoding      dynamic difficulty loss function     
Received: 01 March 2024      Published: 25 April 2025
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(62071323);超声医学工程国家重点实验室开放课题资助项目(2022KFKT004);天津市自然科学基金资助项目(22JCZDJC00220).
Corresponding Authors: Xin GUAN     E-mail: 260077200@qq.com;guanxin@tju.edu.cn
Cite this article:

Ziran ZHANG,Qiang LI,Xin GUAN. Classification network for chest disease based on convolution-assisted self-attention. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 890-901.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.05.002     OR     https://www.zjujournals.com/eng/Y2025/V59/I5/890


基于卷积辅助自注意力的胸部疾病分类网络

针对胸部X光影像中的病变大小不一,纹理复杂,且存在相互影响等问题,提出基于卷积辅助窗口自注意力的胸部X光影像疾病分类网络CAWSNet. 使用Swin Transformer作为骨干网络,以窗口自注意力建模长距离视觉依赖关系,通过引入卷积辅助,在弥补其缺陷的同时,强化局部特征提取能力. 引入图像相对位置编码,通过有向相对位置的动态计算,帮助网络更好地建模像素间的位置关系. 使用类别残差注意力,根据疾病类别来调整分类器关注的区域,突出有效信息,提高多标签分类能力. 提出动态难度损失函数,解决不同疾病分类的难度差异大,数据集中正负样本不平衡的问题. 在公开数据集ChestX-Ray14、CheXpert和MIMIC-CXR-JPG上的实验结果表明,提出CAWSNet的AUC分数分别达到0.853、0.898和0.819,表明该网络在胸部X光影像疾病诊断中的有效性和鲁棒性.


关键词: 胸部X光图像分类,  窗口自注意力,  卷积,  图像相对位置编码,  动态难度损失函数 
Fig.1 Overall architecture of CAWSNet
Fig.2 A group of CAWS Transformer block
Fig.3 Window shift method
Fig.4 Implementation method of CAWS module
Fig.5 Class-specific residual attention classifier
疾病类别AUC
U-IgnoreU-ZerosU-OnesPCANDCNNMAECAWSNet
肺不张0.8180.8110.8580.8480.8250.8270.835
心脏肿大0.8280.8400.8320.8650.8550.8350.856
肺实变0.9380.9320.8990.9080.9370.9250.917
水肿0.9340.9290.9410.9120.9300.9380.953
胸膜增厚0.9280.9310.9340.9400.9230.9410.928
平均值0.8890.8890.8930.8950.8940.8930.898
Tab.2 Comparison of result of different chest disease classification network on CheXpert validation set
Fig.6 ROC curves and AUC values of chest diseases on ChestX-ray14 test set
Fig.7 ROC curve and AUC value of chest disease on CheXpert validation set
Fig.8 ROC curve and AUC value of chest disease on MIMIC-CXR test set
疾病类别AUC
MXTTransDDPCANPCSANetLCTCheXGATSSGEML-LGLCAWSNet
肺不张0.7980.7910.7850.8070.7890.7870.7920.7820.829
心脏肿大0.8960.8850.8970.9100.8890.8790.8920.9040.918
积液0.8420.8420.8370.8790.8420.8370.8400.8350.892
浸润0.7190.7150.7060.6980.6940.6990.7140.7070.726
肿块0.8560.8370.8340.8240.8430.8390.8480.8530.857
结节0.8090.8030.7860.7500.8030.7930.8120.7790.784
肺炎0.7580.7450.7300.7500.7420.7410.7330.7390.782
气胸0.8790.8850.8710.8500.8960.8790.8850.8890.903
肺实变0.7590.7530.7630.8020.7570.7550.7530.7710.820
水肿0.8490.8590.8490.8880.8580.8510.8480.8660.906
肺气肿0.9060.9440.9210.8900.9440.9450.9480.9490.935
纤维化0.8470.8490.8170.8120.8630.8420.8270.8460.827
胸腔积液0.8000.8030.7910.7680.7990.7940.7950.7870.817
疝气0.9130.9240.9430.9150.9150.9310.9320.9070.939
平均值0.8300.8310.8240.8250.8310.8270.8300.8300.853
Tab.1 Comparison of result of different chest disease classification network on ChestX-Ray14 test set
疾病类别AUC
MVCNetMMBTMedCLIPCAWSNet
肺不张0.8180.7580.841
心脏肿大0.8480.8260.824
实变0.8290.7710.833
水肿0.9190.8430.900
心纵膈扩大0.7250.7430.771
骨折0.6650.7290.660
肺部异常0.7400.7590.804
肺不透明0.7570.7150.748
无发现0.8420.8310.867
胸膜增厚0.9470.8860.922
胸膜其他疾病0.8250.8690.858
肺炎0.7150.7520.758
气胸0.8990.8800.861
平均值0.8100.7970.8040.819
Tab.3 Comparison of result of different chest disease classification network on MIMIC-CXR test set
生成方式平均AUC
ChestX-ray14CheXpertMIMIC-CXR
偏置0.8490.8900.813
与输入k交互0.8530.8980.819
与输入qk交互0.8500.8940.816
Tab.4 Effect of position encoding application method on network classification performance
加权位置平均AUC
ChestX-ray14CheXpertMIMIC-CXR
对输入q加权0.8520.8950.817
对输入k加权0.8510.8960.817
对输入v加权0.8530.8980.819
Tab.5 Effect of channel reinforcement weighted position on network classification performance
CAWSCSRAIRPE平均AUC
ChestX-ray14CheXpertMIMIC-CXR
???0.8530.8980.819
???0.8440.8860.811
???0.8490.8940.814
???0.8500.8910.816
Tab.6 Effect of different module on network classification performance
损失函数平均AUC
ChestX-ray14CheXpertMIMIC-CXR
交叉熵损失函数0.8460.8890.810
焦点损失函数0.8500.8920.816
动态难度损失函数0.8530.8980.819
Tab.7 Effect of different loss function on network classification performance
数据集划分平均AUC
CAWSNetSwin Transformer
2、3、4、5训练集,1测试集0.8410.832
1、3、4、5训练集,2测试集0.8360.830
1、2、4、5训练集,3测试集0.8340.829
1、2、3、5训练集,4测试集0.8400.832
1、2、3、4训练集,5测试集0.8410.833
平均值0.8380.831
Tab.8 Result of Stratified KFold cross-validation
网络FLOPs/109tinf/s平均AUC
SSGE[21]17.740.0590.830
CheXGCN[37]17.860.0610.826
PCAN[29]3.920.0540.830
Swin Trans[12]4.370.0130.837
CAWSNet4.520.0180.853
Tab.9 Comparison of computational complexity of different chest disease classification network
Fig.9 Doctor's marked lesion area (left) and Grad-CAM heat map (right)
Fig.10 Example of disease prediction score in CXR image
[1]   JACOBI A, CHUNG M, BERNHEIM A, et al. Portable chest X-ray in coronavirus disease-19 (COVID-19): a pictorial review[J]. Clinical Imaging, 2020, 64 (8): 35- 42
[2]   HEIDARI A, NAVIMIPOUR N J, UNAL M, et al. The COVID-19 epidemic analysis and diagnosis using deep learning: a asystematic literature review and future directions[J]. Computers in Biology and Medicine, 2022, 141: 105141
doi: 10.1016/j.compbiomed.2021.105141
[3]   郑光远, 刘峡壁, 韩光辉 医学影像计算机辅助检测与诊断系统综述[J]. 软件学报, 2018, 29 (5): 1471- 1514
ZHENG Guangyuan, LIU Xiabi, HAN Guanghui Survey on medical image computer aided detection and diagnosis systems[J]. Journal of Software, 2018, 29 (5): 1471- 1514
[4]   CHEN J T, YU H Y, FENG R W, et al. Flow-Mixup: classifying multi-labeled medical images with corrupted labels [C]// IEEE International Conference on Bioinformatics and Biomedicine. Seoul: IEEE, 2020: 534-541.
[5]   ANWAR S M, MAJID M, QAYYUM A, et al. Medical image analysis using convolutional neural networks: a review[J]. Journal of Medical Systems, 2018, 42 (11): 226
doi: 10.1007/s10916-018-1088-1
[6]   YI X, WALIA E, BABYN P Generative adversarial network in medical imaging: a review[J]. Medical Image Analysis, 2019, 58: 101552
doi: 10.1016/j.media.2019.101552
[7]   ZHOU S K, LE H N, LUU K, et al. Deep reinforcement learning in medical imaging: a literature review[J]. Medical Image Analysis, 2021, 73: 102193
doi: 10.1016/j.media.2021.102193
[8]   LI Q, LAI Y, ADAMU MJ Multi-level residual feature fusion network for thoracic disease classification in chest x-ray images[J]. IEEE Access, 2023, 11 (11): 40988- 41002
[9]   胡锦波, 聂为之, 宋丹, 等 可形变Transformer辅助的胸部X光影像疾病诊断模型[J]. 浙江大学学报: 工学版, 2023, 57 (10): 1923- 1932
HU Jinbo, NIE Weizhi, SONG Dan, et al. Chest X-ray imaging disease diagnosis model assisted by deformable Transformer[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (10): 1923- 1932
[10]   VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . [S. l.]: Curran Associates, 2017: 6000-6010.
[11]   DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. (2021-06-03) [2023-08-05]. https://arxiv.org/pdf/2010.11929.pdf.
[12]   LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992-10002.
[13]   WU K, PENG H W, CHEN M H, et al. Rethinking and improving relative position encoding for vision [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10013-10021.
[14]   ZHU K, WU J K. Residual attention: a simple but effective method for multi-label recognition [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 184-193.
[15]   WANG X S, PENG Y F, LU L, et al. ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases [C]// 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3462-3471.
[16]   IRVIN J, RAJPURKAR P, KO M, et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison [C]// 33rd AAAI Conference on Artificial Intelligence. Honolulu: AAAI, 2019: 590-597.
[17]   JOHNSON A E W, POLLARD T J, BERKOWITZ S J, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports[J]. Scientific Data, 2019, 6 (1): 317
doi: 10.1038/s41597-019-0322-0
[18]   CHEN B Z, LI J X, GUO X B, et al. DualCheXNet: dual asymmetric feature learning for thoracic disease classification in chest X-rays[J]. Biomedical Signal Processing and Control, 2019, 53: 101554
doi: 10.1016/j.bspc.2019.04.031
[19]   WANG H Y, WANG S S, QIN Z B, et al. Triple attention learning for classification of 14 thoracic diseases using chest radiography[J]. Medical Image Analysis, 2021, 67 (1): 8415- 8423
[20]   CHEN K, WANG X Q, ZHANG S W Thorax disease classification based on pyramidal convolution shuffle attention neural network[J]. IEEE Access, 2022, 10: 85571- 85581
doi: 10.1109/ACCESS.2022.3198958
[21]   CHEN B Z, ZHANG Z, LI Y Z, et al. Multi-label chest X-ray image classification via semantic similarity graph embedding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (4): 2455- 2468
doi: 10.1109/TCSVT.2021.3079900
[22]   JIANG X B, ZHU Y, GAI G, et al. MXT: a new variant of pyramid vision Transformer for multi-label chest X-ray image classification[J]. Cognitive Computation, 2022, 14 (4): 1362- 1377
doi: 10.1007/s12559-022-10032-4
[23]   PAN X R, GE C J, LU R, et al. On the integration of self-attention and convolution [C]// IEEE Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 805-815.
[24]   HU J, SHEN L, SU G. Squeeze-and-Excitation networks [C]// 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
[25]   LIN T, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (2): 318- 327
doi: 10.1109/TPAMI.2018.2858826
[26]   PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library [C]// 33rd Conference on Neural Information Processing Systems . Vancouver: [s. n. ], 2019: 32.
[27]   PHAM H H, LE T T, TRAN D Q, et al. Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels[J]. Neurocomputing, 2021, 437: 186- 194
doi: 10.1016/j.neucom.2020.03.127
[28]   JIANG X B, ZHU Y, LIU Y T, et al TransDD: a transformer-based dual-path decoder for improving the performance of thoracic diseases classification using chest X-ray[J]. Biomedical Signal Processing and Control, 2024, 91: 13
[29]   ZHU X F, PANG S M, ZHANG X X, et al. PCAN: pixel-wise classification and attention network for thoracic disease classification and weakly supervised localization[J]. Computerized Medical Imaging and Graphics, 2022, 102: 102137
doi: 10.1016/j.compmedimag.2022.102137
[30]   SUN Z X, QU L H, LUO J Z, et al Label correlation transformer for automated chest X-ray diagnosis with reliable interpretability[J]. Radiologia Medica, 2023, 128 (6): 726- 733
doi: 10.1007/s11547-023-01647-0
[31]   LEE Y W, HUANG S K, CHANG R F CheXGAT: a disease correlation-aware network for thorax disease diagnosis from chest X-ray images[J]. Artificial Intelligence in Medicine, 2022, 132: 102382
doi: 10.1016/j.artmed.2022.102382
[32]   XIAO J F, BAI Y T, YUILLE A, et al. Delving into masked autoencoders for multi-label thorax disease classification [C]// IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 3577-3589.
[33]   LIU Z, CHENG Y Z, TAMURA S Multi-label local to global learning: a novel learning paradigm for chest x-ray abnormality classification[J]. IEEE Journal of Biomedical and Health Informatics, 2023, 27 (9): 4409- 4420
doi: 10.1109/JBHI.2023.3281466
[34]   ZHU X F, FENG Q. MVC-NET: multi-view chest radiograph classification network with deep fusion [C]// 18th IEEE International Symposium on Biomedical Imaging. Nice: IEEE, 2021: 554-558.
[35]   JACENKOW G, O'NEIL A Q, TSAFTARIS S A. Indication as prior knowledge for multimodal disease classification in chest radiographs with transformers [C]// IEEE International Symposium on Biomedical Imaging. Kolkata: IEEE, 2022.
[36]   SEIBOLD C, REISS S, SARFRAZ M S, et al. Breaking with fixed set pathology recognition through report-guided contrastive training [C]// Medical Image Computing and Computer Assisted Intervention. Singapore: Springer, 2022, 13435: 690-700.
[37]   CHEN B, LI J, LU G, et al Label co-occurrence learning with graph convolutional networks for multi-label chest x-ray image classification[J]. IEEE Journal of Biomedical and Health Informatics, 2020, 24 (8): 2292- 2302
doi: 10.1109/JBHI.2020.2967084
[1] Zongmin LI,Chang XU,Yun BAI,Shiyang XIAN,Guangcai RONG. Dual-neighborhood graph convolution method for point cloud understanding[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 879-889.
[2] Xinyu QIAN,Qinglin XIE,Gongquan TAO,Zefeng WEN. Quantitative identification method of wheel flats based on multi-structured data-driven[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 688-697.
[3] Jinhui YANG,Xianjun GAO,Yuan KOU,Shengyan YU,Lei XU,Yuanwei YANG. Dual-activation gated convolution with SAR fusion for thick cloud removal from optical remote sensing images[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(4): 804-813.
[4] Minghui YAO,Yueyan WANG,Qiliang WU,Yan NIU,Cong WANG. Siamese networks algorithm based on small human motion behavior recognition[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 504-511.
[5] Bote WANG,Qing WANG,Qiang LIU,Bo JIN. Self-supervised anomaly recognition method for wind turbine blade based on multi-channel vibration principal features[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 653-660.
[6] Jiayi YU,Qin WU. Monocular 3D object detection based on context information enhancement and depth guidance[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 89-99.
[7] Lingxuan LAI,Jingqing LIU,Yisu ZHOU,Xiujuan LI. Identification of leakage in water supply pipelines based on time-frequency convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 196-204.
[8] Qingdong RAN,Lixin ZHENG. Defect detection method of lithium battery electrode based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(9): 1811-1821.
[9] Haijun WANG,Tao WANG,Cijun YU. CFRP ultrasonic detection defect identification method based on recursive quantitative analysis[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1604-1617.
[10] Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU. Attention-fused filter bank dual-view graph convolution motor imagery EEG classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1326-1335.
[11] Jinye LI,Yongqiang LI. Spatial-temporal multi-graph convolution for traffic flow prediction by integrating knowledge graphs[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1366-1376.
[12] Jun YANG,Chen ZHANG. Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1121-1132.
[13] Kang HAN,Hongfei ZHAN,Junhe YU,Rui WANG. Rolling bearing fault diagnosis based on dilated convolution and enhanced multi-scale feature adaptive fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1285-1295.
[14] Zhiwei XING,Shujie ZHU,Biao LI. Airline baggage feature perception based on improved graph convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 941-950.
[15] Yongxi HE,Hu HAN,Bo KONG. Aspect-based sentiment analysis model based on multi-dependency graph and knowledge fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 737-747.