Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (5): 890-901    DOI: 10.3785/j.issn.1008-973X.2025.05.002
计算机技术、信息工程     
基于卷积辅助自注意力的胸部疾病分类网络
张自然(),李锵,关欣*()
天津大学 微电子学院,天津 300072
Classification network for chest disease based on convolution-assisted self-attention
Ziran ZHANG(),Qiang LI,Xin GUAN*()
School of Microelectronics, Tianjin University, Tianjin 300072, China
 全文: PDF(2930 KB)   HTML
摘要:

针对胸部X光影像中的病变大小不一,纹理复杂,且存在相互影响等问题,提出基于卷积辅助窗口自注意力的胸部X光影像疾病分类网络CAWSNet. 使用Swin Transformer作为骨干网络,以窗口自注意力建模长距离视觉依赖关系,通过引入卷积辅助,在弥补其缺陷的同时,强化局部特征提取能力. 引入图像相对位置编码,通过有向相对位置的动态计算,帮助网络更好地建模像素间的位置关系. 使用类别残差注意力,根据疾病类别来调整分类器关注的区域,突出有效信息,提高多标签分类能力. 提出动态难度损失函数,解决不同疾病分类的难度差异大,数据集中正负样本不平衡的问题. 在公开数据集ChestX-Ray14、CheXpert和MIMIC-CXR-JPG上的实验结果表明,提出CAWSNet的AUC分数分别达到0.853、0.898和0.819,表明该网络在胸部X光影像疾病诊断中的有效性和鲁棒性.

关键词: 胸部X光图像分类窗口自注意力卷积图像相对位置编码动态难度损失函数    
Abstract:

A chest disease classification network based on convolution-assisted window self-attention was proposed, called CAWSNet, aiming at the issues of varying lesion sizes, complex textures, and mutual interference in chest X-ray images. The Swin Transformer was utilized as the backbone, employing window self-attention to model long-range visual dependencies. Convolution was introduced to enhance local feature extraction capability while compensating for the deficiencies of window self-attention. Image relative position encoding was used to dynamically calculate directed relative positions, helping the network better model pixel-wise spatial relationships. Class-specific residual attention was employed, and the classifier’s focus area was adjusted based on disease categories in order to highlight effective information and enhance multi-label classification capability. Dynamic difficulty loss function was proposed to alleviate the problem of large differences in disease classification difficulty and the imbalance of positive and negative samples in the dataset. The experimental results on the public datasets ChestX-Ray14, CheXpert and MIMIC-CXR-JPG demonstrate that proposed CAWSNet achieves AUC scores of 0.853, 0.898 and 0.819, respectively, confirming the effectiveness and robustness of the network in diagnosing chest diseases through X-ray images.

Key words: chest X-ray image classification    window self-attention    convolution    image relative position encoding    dynamic difficulty loss function
收稿日期: 2024-03-01 出版日期: 2025-04-25
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(62071323);超声医学工程国家重点实验室开放课题资助项目(2022KFKT004);天津市自然科学基金资助项目(22JCZDJC00220).
通讯作者: 关欣     E-mail: 260077200@qq.com;guanxin@tju.edu.cn
作者简介: 张自然(1998—),男,硕士生,从事深度学习图像处理的研究. orcid.org/0009-0008-2472-5280. E-mail:260077200@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
张自然
李锵
关欣

引用本文:

张自然,李锵,关欣. 基于卷积辅助自注意力的胸部疾病分类网络[J]. 浙江大学学报(工学版), 2025, 59(5): 890-901.

Ziran ZHANG,Qiang LI,Xin GUAN. Classification network for chest disease based on convolution-assisted self-attention. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 890-901.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.05.002        https://www.zjujournals.com/eng/CN/Y2025/V59/I5/890

图 1  CAWSNet的整体架构
图 2  一组CAWS Transformer块
图 3  窗口移位方法
图 4  CAWS模块的实现方法
图 5  类别残差注意分类器
疾病类别AUC
U-IgnoreU-ZerosU-OnesPCANDCNNMAECAWSNet
肺不张0.8180.8110.8580.8480.8250.8270.835
心脏肿大0.8280.8400.8320.8650.8550.8350.856
肺实变0.9380.9320.8990.9080.9370.9250.917
水肿0.9340.9290.9410.9120.9300.9380.953
胸膜增厚0.9280.9310.9340.9400.9230.9410.928
平均值0.8890.8890.8930.8950.8940.8930.898
表 2  不同胸部疾病分类网络在CheXpert验证集上的结果比较
图 6  ChestX-ray14测试集上胸部疾病的ROC曲线和AUC值
图 7  CheXpert验证集上胸部疾病的ROC曲线和AUC值
图 8  MIMIC-CXR测试集上胸部疾病的ROC曲线和AUC值
疾病类别AUC
MXTTransDDPCANPCSANetLCTCheXGATSSGEML-LGLCAWSNet
肺不张0.7980.7910.7850.8070.7890.7870.7920.7820.829
心脏肿大0.8960.8850.8970.9100.8890.8790.8920.9040.918
积液0.8420.8420.8370.8790.8420.8370.8400.8350.892
浸润0.7190.7150.7060.6980.6940.6990.7140.7070.726
肿块0.8560.8370.8340.8240.8430.8390.8480.8530.857
结节0.8090.8030.7860.7500.8030.7930.8120.7790.784
肺炎0.7580.7450.7300.7500.7420.7410.7330.7390.782
气胸0.8790.8850.8710.8500.8960.8790.8850.8890.903
肺实变0.7590.7530.7630.8020.7570.7550.7530.7710.820
水肿0.8490.8590.8490.8880.8580.8510.8480.8660.906
肺气肿0.9060.9440.9210.8900.9440.9450.9480.9490.935
纤维化0.8470.8490.8170.8120.8630.8420.8270.8460.827
胸腔积液0.8000.8030.7910.7680.7990.7940.7950.7870.817
疝气0.9130.9240.9430.9150.9150.9310.9320.9070.939
平均值0.8300.8310.8240.8250.8310.8270.8300.8300.853
表 1  不同胸部疾病分类网络在ChestX-Ray14测试集上的结果比较
疾病类别AUC
MVCNetMMBTMedCLIPCAWSNet
肺不张0.8180.7580.841
心脏肿大0.8480.8260.824
实变0.8290.7710.833
水肿0.9190.8430.900
心纵膈扩大0.7250.7430.771
骨折0.6650.7290.660
肺部异常0.7400.7590.804
肺不透明0.7570.7150.748
无发现0.8420.8310.867
胸膜增厚0.9470.8860.922
胸膜其他疾病0.8250.8690.858
肺炎0.7150.7520.758
气胸0.8990.8800.861
平均值0.8100.7970.8040.819
表 3  不同胸部疾病分类网络在MIMIC-CXR测试集上的结果比较
生成方式平均AUC
ChestX-ray14CheXpertMIMIC-CXR
偏置0.8490.8900.813
与输入k交互0.8530.8980.819
与输入qk交互0.8500.8940.816
表 4  位置编码生成方式对网络分类效果的影响
加权位置平均AUC
ChestX-ray14CheXpertMIMIC-CXR
对输入q加权0.8520.8950.817
对输入k加权0.8510.8960.817
对输入v加权0.8530.8980.819
表 5  通道增强加权位置对网络分类效果的影响
CAWSCSRAIRPE平均AUC
ChestX-ray14CheXpertMIMIC-CXR
???0.8530.8980.819
???0.8440.8860.811
???0.8490.8940.814
???0.8500.8910.816
表 6  不同模块对网络分类效果的影响
损失函数平均AUC
ChestX-ray14CheXpertMIMIC-CXR
交叉熵损失函数0.8460.8890.810
焦点损失函数0.8500.8920.816
动态难度损失函数0.8530.8980.819
表 7  不同损失函数对网络分类效果的影响
数据集划分平均AUC
CAWSNetSwin Transformer
2、3、4、5训练集,1测试集0.8410.832
1、3、4、5训练集,2测试集0.8360.830
1、2、4、5训练集,3测试集0.8340.829
1、2、3、5训练集,4测试集0.8400.832
1、2、3、4训练集,5测试集0.8410.833
平均值0.8380.831
表 8  Stratified KFold交叉验证结果
网络FLOPs/109tinf/s平均AUC
SSGE[21]17.740.0590.830
CheXGCN[37]17.860.0610.826
PCAN[29]3.920.0540.830
Swin Trans[12]4.370.0130.837
CAWSNet4.520.0180.853
表 9  不同胸部疾病分类网络的计算复杂度比较
图 9  医生标记病变区域(左)与Grad-CAM热图(右)
图 10  CXR图像中的疾病预测得分示例
1 JACOBI A, CHUNG M, BERNHEIM A, et al. Portable chest X-ray in coronavirus disease-19 (COVID-19): a pictorial review[J]. Clinical Imaging, 2020, 64 (8): 35- 42
2 HEIDARI A, NAVIMIPOUR N J, UNAL M, et al. The COVID-19 epidemic analysis and diagnosis using deep learning: a asystematic literature review and future directions[J]. Computers in Biology and Medicine, 2022, 141: 105141
doi: 10.1016/j.compbiomed.2021.105141
3 郑光远, 刘峡壁, 韩光辉 医学影像计算机辅助检测与诊断系统综述[J]. 软件学报, 2018, 29 (5): 1471- 1514
ZHENG Guangyuan, LIU Xiabi, HAN Guanghui Survey on medical image computer aided detection and diagnosis systems[J]. Journal of Software, 2018, 29 (5): 1471- 1514
4 CHEN J T, YU H Y, FENG R W, et al. Flow-Mixup: classifying multi-labeled medical images with corrupted labels [C]// IEEE International Conference on Bioinformatics and Biomedicine. Seoul: IEEE, 2020: 534-541.
5 ANWAR S M, MAJID M, QAYYUM A, et al. Medical image analysis using convolutional neural networks: a review[J]. Journal of Medical Systems, 2018, 42 (11): 226
doi: 10.1007/s10916-018-1088-1
6 YI X, WALIA E, BABYN P Generative adversarial network in medical imaging: a review[J]. Medical Image Analysis, 2019, 58: 101552
doi: 10.1016/j.media.2019.101552
7 ZHOU S K, LE H N, LUU K, et al. Deep reinforcement learning in medical imaging: a literature review[J]. Medical Image Analysis, 2021, 73: 102193
doi: 10.1016/j.media.2021.102193
8 LI Q, LAI Y, ADAMU MJ Multi-level residual feature fusion network for thoracic disease classification in chest x-ray images[J]. IEEE Access, 2023, 11 (11): 40988- 41002
9 胡锦波, 聂为之, 宋丹, 等 可形变Transformer辅助的胸部X光影像疾病诊断模型[J]. 浙江大学学报: 工学版, 2023, 57 (10): 1923- 1932
HU Jinbo, NIE Weizhi, SONG Dan, et al. Chest X-ray imaging disease diagnosis model assisted by deformable Transformer[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (10): 1923- 1932
10 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . [S. l.]: Curran Associates, 2017: 6000-6010.
11 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. (2021-06-03) [2023-08-05]. https://arxiv.org/pdf/2010.11929.pdf.
12 LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992-10002.
13 WU K, PENG H W, CHEN M H, et al. Rethinking and improving relative position encoding for vision [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10013-10021.
14 ZHU K, WU J K. Residual attention: a simple but effective method for multi-label recognition [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 184-193.
15 WANG X S, PENG Y F, LU L, et al. ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases [C]// 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3462-3471.
16 IRVIN J, RAJPURKAR P, KO M, et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison [C]// 33rd AAAI Conference on Artificial Intelligence. Honolulu: AAAI, 2019: 590-597.
17 JOHNSON A E W, POLLARD T J, BERKOWITZ S J, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports[J]. Scientific Data, 2019, 6 (1): 317
doi: 10.1038/s41597-019-0322-0
18 CHEN B Z, LI J X, GUO X B, et al. DualCheXNet: dual asymmetric feature learning for thoracic disease classification in chest X-rays[J]. Biomedical Signal Processing and Control, 2019, 53: 101554
doi: 10.1016/j.bspc.2019.04.031
19 WANG H Y, WANG S S, QIN Z B, et al. Triple attention learning for classification of 14 thoracic diseases using chest radiography[J]. Medical Image Analysis, 2021, 67 (1): 8415- 8423
20 CHEN K, WANG X Q, ZHANG S W Thorax disease classification based on pyramidal convolution shuffle attention neural network[J]. IEEE Access, 2022, 10: 85571- 85581
doi: 10.1109/ACCESS.2022.3198958
21 CHEN B Z, ZHANG Z, LI Y Z, et al. Multi-label chest X-ray image classification via semantic similarity graph embedding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (4): 2455- 2468
doi: 10.1109/TCSVT.2021.3079900
22 JIANG X B, ZHU Y, GAI G, et al. MXT: a new variant of pyramid vision Transformer for multi-label chest X-ray image classification[J]. Cognitive Computation, 2022, 14 (4): 1362- 1377
doi: 10.1007/s12559-022-10032-4
23 PAN X R, GE C J, LU R, et al. On the integration of self-attention and convolution [C]// IEEE Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 805-815.
24 HU J, SHEN L, SU G. Squeeze-and-Excitation networks [C]// 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
25 LIN T, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (2): 318- 327
doi: 10.1109/TPAMI.2018.2858826
26 PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library [C]// 33rd Conference on Neural Information Processing Systems . Vancouver: [s. n. ], 2019: 32.
27 PHAM H H, LE T T, TRAN D Q, et al. Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels[J]. Neurocomputing, 2021, 437: 186- 194
doi: 10.1016/j.neucom.2020.03.127
28 JIANG X B, ZHU Y, LIU Y T, et al TransDD: a transformer-based dual-path decoder for improving the performance of thoracic diseases classification using chest X-ray[J]. Biomedical Signal Processing and Control, 2024, 91: 13
29 ZHU X F, PANG S M, ZHANG X X, et al. PCAN: pixel-wise classification and attention network for thoracic disease classification and weakly supervised localization[J]. Computerized Medical Imaging and Graphics, 2022, 102: 102137
doi: 10.1016/j.compmedimag.2022.102137
30 SUN Z X, QU L H, LUO J Z, et al Label correlation transformer for automated chest X-ray diagnosis with reliable interpretability[J]. Radiologia Medica, 2023, 128 (6): 726- 733
doi: 10.1007/s11547-023-01647-0
31 LEE Y W, HUANG S K, CHANG R F CheXGAT: a disease correlation-aware network for thorax disease diagnosis from chest X-ray images[J]. Artificial Intelligence in Medicine, 2022, 132: 102382
doi: 10.1016/j.artmed.2022.102382
32 XIAO J F, BAI Y T, YUILLE A, et al. Delving into masked autoencoders for multi-label thorax disease classification [C]// IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2023: 3577-3589.
33 LIU Z, CHENG Y Z, TAMURA S Multi-label local to global learning: a novel learning paradigm for chest x-ray abnormality classification[J]. IEEE Journal of Biomedical and Health Informatics, 2023, 27 (9): 4409- 4420
doi: 10.1109/JBHI.2023.3281466
34 ZHU X F, FENG Q. MVC-NET: multi-view chest radiograph classification network with deep fusion [C]// 18th IEEE International Symposium on Biomedical Imaging. Nice: IEEE, 2021: 554-558.
35 JACENKOW G, O'NEIL A Q, TSAFTARIS S A. Indication as prior knowledge for multimodal disease classification in chest radiographs with transformers [C]// IEEE International Symposium on Biomedical Imaging. Kolkata: IEEE, 2022.
36 SEIBOLD C, REISS S, SARFRAZ M S, et al. Breaking with fixed set pathology recognition through report-guided contrastive training [C]// Medical Image Computing and Computer Assisted Intervention. Singapore: Springer, 2022, 13435: 690-700.
37 CHEN B, LI J, LU G, et al Label co-occurrence learning with graph convolutional networks for multi-label chest x-ray image classification[J]. IEEE Journal of Biomedical and Health Informatics, 2020, 24 (8): 2292- 2302
doi: 10.1109/JBHI.2020.2967084
[1] 李宗民,徐畅,白云,鲜世洋,戎光彩. 面向点云理解的双邻域图卷积方法[J]. 浙江大学学报(工学版), 2025, 59(5): 879-889.
[2] 钱新宇,谢清林,陶功权,温泽峰. 基于多结构数据驱动的车轮扁疤定量识别方法[J]. 浙江大学学报(工学版), 2025, 59(4): 688-697.
[3] 杨金辉,高贤君,寇媛,于盛妍,许磊,杨元维. 融合SAR的光学遥感影像双激活门控卷积厚云去除[J]. 浙江大学学报(工学版), 2025, 59(4): 804-813.
[4] 姚明辉,王悦燕,吴启亮,牛燕,王聪. 基于小样本人体运动行为识别的孪生网络算法[J]. 浙江大学学报(工学版), 2025, 59(3): 504-511.
[5] 王博特,王卿,刘强,金波. 基于多通道振动主元特征的风电机组叶片自监督异常识别方法[J]. 浙江大学学报(工学版), 2025, 59(3): 653-660.
[6] 于家艺,吴秦. 基于上下文信息增强和深度引导的单目3D目标检测[J]. 浙江大学学报(工学版), 2025, 59(1): 89-99.
[7] 赖凌轩,柳景青,周一粟,李秀娟. 基于时频卷积神经网络的供水管道漏损识别[J]. 浙江大学学报(工学版), 2025, 59(1): 196-204.
[8] 冉庆东,郑力新. 基于改进YOLOv5的锂电池极片缺陷检测方法[J]. 浙江大学学报(工学版), 2024, 58(9): 1811-1821.
[9] 王海军,王涛,俞慈君. 基于递归量化分析的CFRP超声检测缺陷识别方法[J]. 浙江大学学报(工学版), 2024, 58(8): 1604-1617.
[10] 吴书晗,王丹,陈远方,贾子钰,张越棋,许萌. 融合注意力的滤波器组双视图图卷积运动想象脑电分类[J]. 浙江大学学报(工学版), 2024, 58(7): 1326-1335.
[11] 李劲业,李永强. 融合知识图谱的时空多图卷积交通流量预测[J]. 浙江大学学报(工学版), 2024, 58(7): 1366-1376.
[12] 杨军,张琛. 基于边界点估计与稀疏卷积神经网络的三维点云语义分割[J]. 浙江大学学报(工学版), 2024, 58(6): 1121-1132.
[13] 韩康,战洪飞,余军合,王瑞. 基于空洞卷积和增强型多尺度特征自适应融合的滚动轴承故障诊断[J]. 浙江大学学报(工学版), 2024, 58(6): 1285-1295.
[14] 邢志伟,朱书杰,李彪. 基于改进图卷积神经网络的航空行李特征感知[J]. 浙江大学学报(工学版), 2024, 58(5): 941-950.
[15] 何勇禧,韩虎,孔博. 基于多依赖图和知识融合的方面级情感分析模型[J]. 浙江大学学报(工学版), 2024, 58(4): 737-747.