Please wait a minute...
浙江大学学报(医学版)  2019, Vol. 48 Issue (6): 594-602    DOI: 10.3785/j.issn.1008-9292.2019.12.02
原著     
决策树分析在急性心肌梗死事件预测中的应用
张圣1(),胡振杰2,叶璐3,郑亚如4,*()
1. 浙江省人民医院 杭州医学院附属人民医院神经内科, 浙江 杭州 310014
2. 中国人民解放军联勤保障部队第九○六医院呼吸与重症医学科, 浙江 宁波 315040
3. 浙江大学医学院精神卫生中心暨杭州市第七人民医院检验科, 浙江 杭州 310013
4. 浙江省人民医院 杭州医学院附属人民医院心血管内科, 浙江 杭州 310014
Application of Logistic regression and decision tree analysis in prediction of acute myocardial infarction events
ZHANG Sheng1(),HU Zhenjie2,YE Lu3,ZHENG Yaru4,*()
1. Department of Neurology, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou 310014, China
2. Department of Respiratory and Critical Medicine, No. 906 Hospital of Chinese PLA, Ningbo 315040, China
3. Clinical Laboratory, Mental Health Center of Zhejiang University School of Medicine, Hangzhou Seventh People's Hospital, Hangzhou 310013, China
4. Department of Cardiology, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou 310014, China
 全文: PDF(1713 KB)   HTML( 18 )
摘要:

目的: 评价和比较Logistic回归和决策树分析用于预测急性心肌梗死(AMI)事件的可行性和有效性。方法: 回顾性分析2018年10月至2019年4月在浙江省人民医院因心绞痛或不明原因胸痛行选择性冠状动脉造影的295例患者的临床资料,其中55例诊断为AMI。分别利用Logistic回归分析和决策树分析建立AMI事件预测模型,并在是否根据Logistic回归结果条件下建立决策树分析模型(决策树1和决策树2),继而利用ROC曲线评估上述三组模型预测AMI的价值。结果: 二元Logistic回归分析结果显示,冠心病史、冠状动脉多支病变、他汀类药物史和载脂蛋白A1是AMI发生的独立影响因素(均P < 0.05)。不根据Logistic回归分析结果建立的决策树模型(决策树1)显示,冠状动脉多支病变为根节点,其后分别是冠心病史、载脂蛋白A1水平(以1.314 g/L作为分界点)和抗血小板聚集药物史作为子节点;而根据Logistic回归分析结果建立的决策树模型(决策树2)显示,冠状动脉多支病变为根节点,其后是冠心病史和载脂蛋白A1作为子节点。在对AMI事件的预测中,Logistic回归模型的AUC为0.826,而决策树模型的AUC分别为0.765(决策树1)和0.726(决策树2)。三组模型间比较结果显示,Logistic回归模型的AUC优于决策树2(95% CI:0.041~0.145,Z=3.534,P < 0.01),但与决策树1差异无统计学意义(95% CI:-0.014~0.121,Z=-1.173,P>0.05)。结论: 在对AMI事件的预测分析中,不根据Logistic回归模型结果建立的决策树模型效力与Logistic回归模型相当,未来有望应用于AMI患者的防治工作。

关键词: 心肌梗死急性病Logistic模型回归分析决策树预测    
Abstract:

Objective: To evaluate the application of decision tree method and Logistic regression in the prediction of acute myocardial infarction (AMI) events. Methods: The clinical data of 295 patients, who underwent coronary angiography due to angina or chest pain with unidentified causes in Zhejiang provincial People's Hospital during October 2018 and April 2019, were retrospectively analyzed. Fifty five patients were identified as AMI. Logistic regression and decision tree methods were performed to establish predictive models for the occurrence of AMI, respectively; and the models created by decision tree analysis were divided into Logistic regression-independent model (Tree 1) and Logistic regression-dependent model (Tree 2). The performance of Logistic regression and decision tree models were compared using the area under the receiver operating characteristic (ROC) curve. Results: Logistic regression analysis showed that history of coronary artery disease, multi-vessel coronary artery disease, statin use and apolipoprotein (ApoA1) level were independent influencing factors of AMI events (all P < 0.05). Logistic regression-independent decision tree model (Tree 1) showed that multi-vessel coronary artery disease was the root node, and history of coronary artery disease, ApoA1 level (the cutoff value:1.314 g/L) and anti-platelet drug use were descendant nodes. In Logistic regression-dependent decision tree model (Tree 2), multi-vessel coronary artery disease was still the root node, but only followed by two descendant nodes including history of coronary artery disease and ApoA1 level. The area under the curve (AUC) of ROC of Logistic regression model was 0.826, and AUCs of decision tree models were 0.765 and 0.726, respectively. AUC of Logistic regression model was significantly higher than that of Tree 2 (95% CI=0.041-0.145, Z=3.534, P < 0.001), but was not higher than that of Tree 1 (95% CI=-0.014-0.121, Z=-1.173, P>0.05). Conclusion: The predictive value for AMI event was comparable between Logistic regression-independent decision tree model and Logistic regression model, implying the data mining methods are feasible and effective in AMI prevention and control.

Key words: Myocardial infarction    Acute disease    Logistic models    Regression analysis    Decision trees    Forecasting
收稿日期: 2019-06-05 出版日期: 2020-01-19
:  R542.2+2  
基金资助: 国家自然科学基金(81801162);浙江省医学会临床科研资金项目(2017XYC-A02)
通讯作者: 郑亚如     E-mail: xiaoxiaoqing_23@hotmail.com;zhengyaru@zjheart.com
作者简介: 张圣(1986—), 女, 博士, 主治医师, 主要从事心脑血管疾病的临床研究; E-mail:xiaoxiaoqing_23@hotmail.com; https://orcid.org/0000-0003-0644-7930
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
张圣
胡振杰
叶璐
郑亚如

引用本文:

张圣,胡振杰,叶璐,郑亚如. 决策树分析在急性心肌梗死事件预测中的应用[J]. 浙江大学学报(医学版), 2019, 48(6): 594-602.

ZHANG Sheng,HU Zhenjie,YE Lu,ZHENG Yaru. Application of Logistic regression and decision tree analysis in prediction of acute myocardial infarction events. J Zhejiang Univ (Med Sci), 2019, 48(6): 594-602.

链接本文:

http://www.zjujournals.com/med/CN/10.3785/j.issn.1008-9292.2019.12.02        http://www.zjujournals.com/med/CN/Y2019/V48/I6/594

变量 非AMI组(n=240) AMI组(n=55) P
AMI:急性心肌梗死.
年龄(岁) 65(54~71) 66(60~76) <0.05
女性 95(39.6) 15(27.3) >0.05
高血压病史 153(36.3) 34(61.8) >0.05
糖尿病史 49(20.4) 16(29.1) >0.05
心房颤动史 15(6.3) 7(12.7) >0.05
冠心病史 95(39.6) 11(20.0) <0.01
心肌梗死史 10(4.2) 1(1.8) >0.05
脑卒中史 11(4.6) 3(5.5) >0.05
抗血小板聚集药物史 57(23.8) 3(5.5) <0.01
他汀类药物史 59(24.6) 2(3.6) <0.01
抗凝药物史 5(2.1) 0(0) >0.05
冠状动脉支架植入史 45(18.8) 2(3.6) <0.01
白细胞计数(×109/L) 6.2±1.8 8.2±3.0 <0.01
丙氨酸转氨酶(U/L) 19(14~27) 30(15~46) >0.05
天冬氨酸转氨酶(U/L) 22(19~28) 60(25~201) >0.05
肌酐(μmol/L) 85.7±40.0 87.0±26.1 >0.05
总胆固醇(mmol/L) 4.2±1.3 4.2±1.0 >0.05
三酰甘油(mmol/L) 1.6±1.7 1.1±0.5 >0.05
高密度脂蛋白(mmol/L) 1.1±0.4 1.1±0.2 >0.05
低密度脂蛋白(mmol/L) 2.2±0.9 2.6±0.8 <0.05
载脂蛋白A1(g/L) 1.2±0.2 1.1±0.2 <0.01
载脂蛋白B(g/L) 0.7±0.2 0.8±0.2 >0.05
左前降支狭窄程度(%) 42.3±34.3 77.0±27.6 <0.01
左回旋支狭窄程度(%) 0(0~50) 70(0~90) <0.01
右冠状动脉狭窄程度(%) 30(0~50) 70(30~90) <0.01
冠状动脉多支病变 71(29.6) 39(70.9) <0.01
颈动脉斑块 104(43.3) 23(41.8) >0.05
最大斑块面积(mm2) 0(0~18.4) 0(0~37.1) >0.05
低回声斑块 57(23.8) 11(20.0) >0.05
高回声斑块 54(22.5) 8(14.5) >0.05
混合回声斑块 33(13.8) 10(18.2) >0.05
不稳定斑块 79(32.9) 19(34.5) >0.05
颈动脉狭窄 13(5.4) 5(9.1) >0.05
表 1  发生AMI影响因素的单因素分析结果
变量 OR 95%CI P
冠心病史 0.280 0.116~0.673 <0.01
抗血小板聚集药物史 1.368 0.280~6.673 >0.05
他汀类药物史 0.060 0.006~0.638 <0.05
冠状动脉支架植入史 0.593 0.095~3.691 >0.05
低密度脂蛋白 0.945 0.664~1.345 >0.05
天冬氨酸转氨酶 1.000 0.999~1.002 >0.05
载脂蛋白A1 0.112 0.020~0.626 <0.05
冠状动脉多支病变 8.981 4.216~19.128 <0.01
表 2  预测急性心肌梗死事件的二元Logistic回归分析结果
图 1  不根据Logistic回归分析结果建立的预测AMI事件的决策树模型
图 2  Logistic回归和决策树模型预测AMI的ROC曲线
决策树 级别* P#
*各变量在决策树中所处的节点级别(如级别1为根节点,2和3为级别递降的子节点);#基于各变量所在节点的数据拆分后比较分析得出.
决策树1 冠状动脉多支病变 1 0.031
冠心病史 2 0.045
载脂蛋白A1 3 0.020
抗血小板聚集药物史 3 0.001
决策树2 冠状动脉多支病变 1 0.035
冠心病史 2 0.027
载脂蛋白A1 3 0.004
表 3  拆分数据集后进入决策树模型的变量
模型 AUC 标准误 P 95%CI 准确度(%) 敏感度(%) 特异度(%) 约登指数
AMI:急性心肌梗死.
Logistic回归 0.826 0.032 < 0.01 0.762~0.889 86.2 75.9 79.7 0.56
决策树1 0.765 0.041 <0.01 0.684~0.846 85.4 61.8 91.2 0.53
决策树2 0.726 0.044 <0.01 0.641~0.812 85.1 52.7 92.5 0.45
表 4  Logistic回归和决策树模型预测AMI的ROC曲线分析结果
变量 非冠脉多支病变(n=185) 冠脉多支病变(n=110) P
年龄(岁) 62±12. 67±12 <0.01
女性 82(44.3) 28(25.5) <0.01
高血压病史 116(62.7) 71(64.5) >0.05
糖尿病史 28(15.1) 37(33.6) <0.01
心房颤动史 14(7.6) 8(7.3) >0.05
冠心病史 55(29.7) 51(46.4) <0.01
心肌梗死史 6(3.2) 5(4.5) >0.05
脑卒中史 7(3.8) 7(6.4) >0.05
抗血小板聚集药物史 37(20.0) 23(20.9) >0.05
他汀类药物史 41(22.2) 20(18.2) >0.05
抗凝药物史 4(2.2) 1(0.9) >0.05
冠状动脉支架植入史 27(14.6) 20(18.2) >0.05
白细胞计数(×109/L) 6.3±2.0 6.7±2.6 >0.05
丙氨酸转氨酶(U/L) 19(14~29) 21(15~38) >0.05
天冬氨酸转氨酶(U/L) 22(19~29) 24.5(21~53) >0.05
肌酐(μmol/L) 82±26 82±51 >0.05
总胆固醇(mmol/L) 4.0±1.1 4.2±1.4 >0.05
三酰甘油(mmol/L) 1.4±0.9 1.6±2.2 >0.05
高密度脂蛋白(mmol/L) 1.1±0.4 1.1±0.3 >0.05
低密度脂蛋白(mmol/L) 2.2±0.8 2.3±1.0 >0.05
载脂蛋白A1(g/L) 1.2±0.2 0.8±0.2 >0.05
载脂蛋白B(g/L) 0.7±0.2 0.8±0.2 <0.05
颈动脉斑块 74(40.0) 53(48.2) >0.05
最大斑块面积(mm2) 0(0~13) 2(0~42) < 0.01
低回声斑块 41(22.2) 27(24.5) >0.05
高回声斑块 39(21.1) 23(20.9) >0.05
混合回声斑块 15(8.1) 28(25.5) < 0.01
不稳定斑块 51(27.6) 47(42.7) < 0.01
颈动脉狭窄 7(3.8) 11(10.0) < 0.05
表 5  预测冠状动脉多支病变的单因素分析结果
变量 OR 95%CI P
颈动脉狭窄 0.858 0.236~3.124 >0.05
不稳定斑块 1.097 0.579~2.077 >0.05
最大斑块面积 1.013 1.001~1.027 <0.05
年龄 1.016 0.993~1.040 >0.05
女性 0.463 0.266~0.809 <0.01
冠心病 1.800 1.065~3.044 <0.05
糖尿病 2.795 1.544~5.060 <0.01
表 6  预测冠状动脉多支病变的二元Logistic回归分析结果
图 3  颈动脉最大斑块面积预测冠状动脉多支病变的ROC曲线
1 GAO R , PATEL A , GAO W et al. Prospective observational study of acute coronary syndromes in China:practice patterns and outcomes[J]. Heart, 2008, 94 (5): 554- 560
doi: 10.1136/hrt.2007.119750
2 张啸飞, 胡大一, 丁荣晶 et al. 中国心脑血管疾病死亡现况及流行趋势[J]. 中华心血管病杂志, 2012, 40 (3): 179- 187
ZHANG Xiaofei , HU Dayi , DING Rongjin et al. Status and trend of cardio-cerebral-vascular diseases mortality in China:data from national disease surveillance system between 2004 and 2008[J]. Chinese Journal of Cardiology, 2012, 40 (3): 179- 187
doi: 10.3760/cma.j.issn.0253-3758.2012.03.002
3 CHANG J , LIU X , SUN Y . Mortality due to acute myocardial infarction in China from 1987 to 2014:Secular trends and age-period-cohort effects[J]. Int J Cardiol, 2017, 227 229- 238
doi: 10.1016/j.ijcard.2016.11.130
4 陈伟伟, 高润霖, 刘力生 et al. 中国心血管病报告2013概要[J]. 中国循环杂志, 2014, 8 (7): 487- 491
CHEN Weiwei , GAO Runlin , LIU Lisheng et al. China cardiovascular diseases report 2013:A summary[J]. Chinese Circulation Journal, 2014, 8 (7): 487- 491
doi: 10.3969/j.issn.1000-3614.2014.07.003
5 KITAMURA A , YAMAGISHI K , IMANO H et al. Impact of hypertension and subclinical organ damage on the incidence of cardiovascular disease among Japanese residents at the population and individual levels-the circulatory risk in communities study (CIRCS)[J]. Circ J, 2017, 81 (7): 1022- 1028
doi: 10.1253/circj.CJ-16-1129
6 BHATIA R S , DORIAN P . Screening for cardiovascular disease risk with electrocardiography[J]. JAMA Intern Med, 2018, 178 (9): 1163- 1164
doi: 10.1001/jamainternmed.2018.2773
7 陈振明, 纪双斌, 史湘铃 et al. Markov决策树模型在优化15~49岁女性戊型肝炎免疫接种策略中的应用[J]. 中华流行病学杂志, 2017, 38 (2): 267- 271
CHEN Zhengmin , JI Shuangbin , SHI Xiangling et al. Use the Markov-decision tree model to optimize vaccination strategies of hepatitis E among women aged 15 to 49[J]. Chinese Journal of Epidemiology, 2017, 38 (2): 267- 271
doi: 10.3760/cma.j.issn.0254-6450.2017.02.026
8 LE RAY I , LEE B , WIKMAN A et al. Evaluation of a decision tree for efficient antenatal red blood cell antibody screening[J]. Epidemiology, 2018, 29 (3): 453- 457
doi: 10.1097/EDE.0000000000000805
9 帅健, 李丽萍, 陈业群 . 决策树模型与Logistic回归模型在伤害发生影响因素分析中的作用[J]. 中华疾病控制杂志, 2015, 19 (2): 185- 189
SHUAI Jian , LI Liping , CHEN Yequn . The role of Decision tree model and Logistic regression in injury influencing factors analysis[J]. Chinese Journal of Disease Control & Prevention, 2015, 19 (2): 185- 189
10 THYGESEN K , ALPERT J S , JAFFE A S et al. Fourth universal definition of myocardial infarction (2018)[J]. Eur Heart J, 2019, 40 (3): 237- 269
doi: 10.1093/eurheartj/ehy462
11 ROBERTS J K , RAO S V , SHAW L K et al. Comparative efficacy of coronary revascularization procedures for multivessel coronary artery disease in patients with chronic kidney disease[J]. Am J Cardiol, 2017, 119 (9): 1344- 1351
doi: 10.1016/j.amjcard.2017.01.029
12 XU T , ZUO P , CAO L et al. Omentin-1 is associated with carotid plaque instability among ischemic stroke patients[J]. J Atheroscler Thromb, 2018, 25 (6): 505- 511
doi: 10.5551/jat.42135
13 华扬, 刘蓓蓓, 凌晨 et al. 超声检查对颈动脉狭窄50%~69%和70%~99%诊断准确性的评估[J]. 中国脑血管病杂志, 2006, 3 (5): 211- 218
HUA Yang , LIU Beibei , LING Chen et al. Accurate assessment of the diagnosis between 50-69%and 70-99%carotid stenoses with ultrasono-graphy[J]. Chinese Journal of Cerebrovascular Diseases, 2006, 3 (5): 211- 218
doi: 10.3969/j.issn.1672-5921.2006.05.006
14 HE J , CHEN P , LUO Y et al. Relationship between the maximum carotid plaque area and the severity of coronary atherosclerosis[J]. Int Angiol, 2018, 37 (4): 300- 309
15 何跃, 邓唯茹, 刘司寰 . 基于组合决策树的急诊等待时间预测[J]. 统计与决策, 2016, 1 (6): 72- 74
HE Yue , DENG Weiru , LIU Sihuan . Emergency waiting time prediction based on combined decision tree[J]. Statistics and Decision, 2016, 1 (6): 72- 74
16 赵自强, 郑明 . 应用分类树模型筛选logistic回归中的交互因素[J]. 中国卫生统计, 2007, 24 (2): 114- 116
ZHAO Ziqiang , ZHENG Ming . Apply classification tree to automatically screen some potential interaction factors in Logistic regression[J]. Chinese Journal of Health Statistics, 2007, 24 (2): 114- 116
doi: 10.3969/j.issn.1002-3674.2007.02.001
17 薛允莲 . Logistic回归结合决策树技术在冠心病患者住院费用组合分析中的应用[J]. 中国卫生统计, 2015, 32 (6): 988- 989
XUE Yunlian . The application of logistic regression combined with decision tree technology in the combination analysis of hospitalization expenses of patients with coronary heart disease[J]. Chinese Journal of Health Statistics, 2015, 32 (6): 988- 989
18 黄晓霞, 严玉洁, 尉敏琦 et al. logistic回归、决策树和神经网络在脑卒中高危筛查中的性能比较[J]. 中国慢性病预防与控制, 2016, 24 (6): 412- 415
HUANG Xiaoxia , YAN Yujie , WEI Minqi et al. Comparison of screening group with high risk of stroke among logistic regression, decision trees and neural networks[J]. Chinese Journal of Prevention and Control of Chronic Non-Communicable Diseases, 2016, 24 (6): 412- 415
19 张娴静, 陈政, 赵耐青 et al. 上海市嘉定区农村居民就诊单位选择的影响因素分析——决策树和多分类无序反应变量的logistic回归相结合的方法[J]. 中国卫生统计, 2005, 22 (2): 80- 84
ZHANG Xianjing , CHEN Zheng , ZHAO Naiqing et al. Researches on the factors Influencing the outpatients' choice of selecting care providers in Jiading district of Shanghai:a method of combining decision tree model with multinomial Logistic regression[J]. Chinese Journal of Health Statistics, 2005, 22 (2): 80- 84
doi: 10.3969/j.issn.1002-3674.2005.02.005
20 王梦, 谢高强, 王浩 et al. 颈动脉最大斑块面积的进展速率与新发缺血性心血管事件的关系[J]. 中国循环杂志, 2014, 29 (7): 532- 536
WANG Meng , XIE Gaoqiang , WANG Hao et al. Relationship between the progression pate of corotid maximal plaque area and the risk of new ischemic cardiovascular disease[J]. Chinese Circulation Journal, 2014, 29 (7): 532- 536
doi: 10.3969/j.issn.1000-3614.2014.07.014
[1] 袁雪纯, 向大伟, 敏琼, 丁怡丹, 赵安鹏, 王荣. 急进高原缺氧对大鼠肝脏孕烷X受体表达的影响[J]. 浙江大学学报(医学版), 2019, 48(6): 603-608.
[2] 郭丹玲,胡红杰,赵振华,吕桑英,黄亚男,蒋汝红,蒲彩玲,倪虹霞. 心肌瘢痕对慢性心肌梗死后恶性室性心律失常发生的预测价值[J]. 浙江大学学报(医学版), 2019, 48(5): 511-516.
[3] 余钻标,林作栋,郎德海. 经皮机械血栓清除联合支架植入治疗急性髂股静脉血栓形成患者中远期疗效评估[J]. 浙江大学学报(医学版), 2018, 47(6): 623-627.
[4] 尹孝亮,郎德海,王迪. 经皮机械血栓清除治疗急性髂股静脉血栓形成患者疗效观察[J]. 浙江大学学报(医学版), 2018, 47(6): 588-594.
[5] 李晨,朱瑶,杨金华,徐东升,王建炳,陈坤,李其龙. 浙江省嘉善县三十年肺癌发病趋势研究[J]. 浙江大学学报(医学版), 2018, 47(4): 367-373.
[6] 楼叶琳,周一敏,鲁红,吕卫国. 宫颈锥切术后孕妇早产预测模型的建立[J]. 浙江大学学报(医学版), 2018, 47(4): 351-356.
[7] 何玉贤,郑良荣. 脊髓电刺激对心肌缺血和心肌梗死作用的研究进展[J]. 浙江大学学报(医学版), 2018, 47(2): 201-206.
[8] 李小勇,沈鹏,林鸿波,虞哲彬,陈坤,王建炳. 宁波社区2型糖尿病患者发生糖尿病肾病危险因素调查[J]. 浙江大学学报(医学版), 2018, 47(2): 163-168.
[9] 蒋曦依,李璐,唐慧娟,陈天辉. 结直肠癌高危人群多因素风险预测模型及评价[J]. 浙江大学学报(医学版), 2018, 47(2): 194-200.
[10] 王庆松 等. 基于CT灌注成像的侧支评分预测急性前循环大血管闭塞患者动脉取栓治疗预后的价值[J]. 浙江大学学报(医学版), 2017, 46(4): 377-383.
[11] 张美霞 等. 静脉溶栓获益的最大梗死体积阈值与急性缺血性卒中患者发病时间的关系[J]. 浙江大学学报(医学版), 2017, 46(4): 384-389.
[12] 严凌 等. 阿托伐他汀可改善急性ST段抬高型心肌梗死患者经皮冠状动脉介入治疗后无复流现象[J]. 浙江大学学报(医学版), 2016, 45(5): 530-535.
[13] 何苗 等. 特殊疾病医疗保险对糖尿病患者治疗费用的影响及相关因素分析[J]. 浙江大学学报(医学版), 2016, 45(3): 323-329.
[14] 郎夏冰 等. 中国住院患者急性肾损伤流行病学调查现状[J]. 浙江大学学报(医学版), 2016, 45(2): 208-213.
[15] 沈海燕 等. 血小板计数可预测英夫利昔单克隆抗体治疗活动性克罗恩病的疗效[J]. 浙江大学学报(医学版), 2016, 45(1): 81-85.