Please wait a minute...
J Zhejiang Univ (Med Sci)  2019, Vol. 48 Issue (6): 594-602    DOI: 10.3785/j.issn.1008-9292.2019.12.02
    
Application of Logistic regression and decision tree analysis in prediction of acute myocardial infarction events
ZHANG Sheng1(),HU Zhenjie2,YE Lu3,ZHENG Yaru4,*()
1. Department of Neurology, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou 310014, China
2. Department of Respiratory and Critical Medicine, No. 906 Hospital of Chinese PLA, Ningbo 315040, China
3. Clinical Laboratory, Mental Health Center of Zhejiang University School of Medicine, Hangzhou Seventh People's Hospital, Hangzhou 310013, China
4. Department of Cardiology, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou 310014, China
Download: HTML( 13 )   PDF(1713KB)
Export: BibTeX | EndNote (RIS)      

Abstract  

Objective: To evaluate the application of decision tree method and Logistic regression in the prediction of acute myocardial infarction (AMI) events. Methods: The clinical data of 295 patients, who underwent coronary angiography due to angina or chest pain with unidentified causes in Zhejiang provincial People's Hospital during October 2018 and April 2019, were retrospectively analyzed. Fifty five patients were identified as AMI. Logistic regression and decision tree methods were performed to establish predictive models for the occurrence of AMI, respectively; and the models created by decision tree analysis were divided into Logistic regression-independent model (Tree 1) and Logistic regression-dependent model (Tree 2). The performance of Logistic regression and decision tree models were compared using the area under the receiver operating characteristic (ROC) curve. Results: Logistic regression analysis showed that history of coronary artery disease, multi-vessel coronary artery disease, statin use and apolipoprotein (ApoA1) level were independent influencing factors of AMI events (all P < 0.05). Logistic regression-independent decision tree model (Tree 1) showed that multi-vessel coronary artery disease was the root node, and history of coronary artery disease, ApoA1 level (the cutoff value:1.314 g/L) and anti-platelet drug use were descendant nodes. In Logistic regression-dependent decision tree model (Tree 2), multi-vessel coronary artery disease was still the root node, but only followed by two descendant nodes including history of coronary artery disease and ApoA1 level. The area under the curve (AUC) of ROC of Logistic regression model was 0.826, and AUCs of decision tree models were 0.765 and 0.726, respectively. AUC of Logistic regression model was significantly higher than that of Tree 2 (95% CI=0.041-0.145, Z=3.534, P < 0.001), but was not higher than that of Tree 1 (95% CI=-0.014-0.121, Z=-1.173, P>0.05). Conclusion: The predictive value for AMI event was comparable between Logistic regression-independent decision tree model and Logistic regression model, implying the data mining methods are feasible and effective in AMI prevention and control.



Key wordsMyocardial infarction      Acute disease      Logistic models      Regression analysis      Decision trees      Forecasting     
Received: 05 June 2019      Published: 19 January 2020
CLC:  R542.2+2  
Corresponding Authors: ZHENG Yaru     E-mail: xiaoxiaoqing_23@hotmail.com;zhengyaru@zjheart.com
Cite this article:

ZHANG Sheng,HU Zhenjie,YE Lu,ZHENG Yaru. Application of Logistic regression and decision tree analysis in prediction of acute myocardial infarction events. J Zhejiang Univ (Med Sci), 2019, 48(6): 594-602.

URL:

http://www.zjujournals.com/med/10.3785/j.issn.1008-9292.2019.12.02     OR     http://www.zjujournals.com/med/Y2019/V48/I6/594


决策树分析在急性心肌梗死事件预测中的应用

目的: 评价和比较Logistic回归和决策树分析用于预测急性心肌梗死(AMI)事件的可行性和有效性。方法: 回顾性分析2018年10月至2019年4月在浙江省人民医院因心绞痛或不明原因胸痛行选择性冠状动脉造影的295例患者的临床资料,其中55例诊断为AMI。分别利用Logistic回归分析和决策树分析建立AMI事件预测模型,并在是否根据Logistic回归结果条件下建立决策树分析模型(决策树1和决策树2),继而利用ROC曲线评估上述三组模型预测AMI的价值。结果: 二元Logistic回归分析结果显示,冠心病史、冠状动脉多支病变、他汀类药物史和载脂蛋白A1是AMI发生的独立影响因素(均P < 0.05)。不根据Logistic回归分析结果建立的决策树模型(决策树1)显示,冠状动脉多支病变为根节点,其后分别是冠心病史、载脂蛋白A1水平(以1.314 g/L作为分界点)和抗血小板聚集药物史作为子节点;而根据Logistic回归分析结果建立的决策树模型(决策树2)显示,冠状动脉多支病变为根节点,其后是冠心病史和载脂蛋白A1作为子节点。在对AMI事件的预测中,Logistic回归模型的AUC为0.826,而决策树模型的AUC分别为0.765(决策树1)和0.726(决策树2)。三组模型间比较结果显示,Logistic回归模型的AUC优于决策树2(95% CI:0.041~0.145,Z=3.534,P < 0.01),但与决策树1差异无统计学意义(95% CI:-0.014~0.121,Z=-1.173,P>0.05)。结论: 在对AMI事件的预测分析中,不根据Logistic回归模型结果建立的决策树模型效力与Logistic回归模型相当,未来有望应用于AMI患者的防治工作。


关键词: 心肌梗死,  急性病,  Logistic模型,  回归分析,  决策树,  预测 
变量 非AMI组(n=240) AMI组(n=55) P
AMI:急性心肌梗死.
年龄(岁) 65(54~71) 66(60~76) <0.05
女性 95(39.6) 15(27.3) >0.05
高血压病史 153(36.3) 34(61.8) >0.05
糖尿病史 49(20.4) 16(29.1) >0.05
心房颤动史 15(6.3) 7(12.7) >0.05
冠心病史 95(39.6) 11(20.0) <0.01
心肌梗死史 10(4.2) 1(1.8) >0.05
脑卒中史 11(4.6) 3(5.5) >0.05
抗血小板聚集药物史 57(23.8) 3(5.5) <0.01
他汀类药物史 59(24.6) 2(3.6) <0.01
抗凝药物史 5(2.1) 0(0) >0.05
冠状动脉支架植入史 45(18.8) 2(3.6) <0.01
白细胞计数(×109/L) 6.2±1.8 8.2±3.0 <0.01
丙氨酸转氨酶(U/L) 19(14~27) 30(15~46) >0.05
天冬氨酸转氨酶(U/L) 22(19~28) 60(25~201) >0.05
肌酐(μmol/L) 85.7±40.0 87.0±26.1 >0.05
总胆固醇(mmol/L) 4.2±1.3 4.2±1.0 >0.05
三酰甘油(mmol/L) 1.6±1.7 1.1±0.5 >0.05
高密度脂蛋白(mmol/L) 1.1±0.4 1.1±0.2 >0.05
低密度脂蛋白(mmol/L) 2.2±0.9 2.6±0.8 <0.05
载脂蛋白A1(g/L) 1.2±0.2 1.1±0.2 <0.01
载脂蛋白B(g/L) 0.7±0.2 0.8±0.2 >0.05
左前降支狭窄程度(%) 42.3±34.3 77.0±27.6 <0.01
左回旋支狭窄程度(%) 0(0~50) 70(0~90) <0.01
右冠状动脉狭窄程度(%) 30(0~50) 70(30~90) <0.01
冠状动脉多支病变 71(29.6) 39(70.9) <0.01
颈动脉斑块 104(43.3) 23(41.8) >0.05
最大斑块面积(mm2) 0(0~18.4) 0(0~37.1) >0.05
低回声斑块 57(23.8) 11(20.0) >0.05
高回声斑块 54(22.5) 8(14.5) >0.05
混合回声斑块 33(13.8) 10(18.2) >0.05
不稳定斑块 79(32.9) 19(34.5) >0.05
颈动脉狭窄 13(5.4) 5(9.1) >0.05
Tab 1 Univariate analysis on predicting factors for AMI  [M(IQR)或n(%)或${\bar x}$±s]
变量 OR 95%CI P
冠心病史 0.280 0.116~0.673 <0.01
抗血小板聚集药物史 1.368 0.280~6.673 >0.05
他汀类药物史 0.060 0.006~0.638 <0.05
冠状动脉支架植入史 0.593 0.095~3.691 >0.05
低密度脂蛋白 0.945 0.664~1.345 >0.05
天冬氨酸转氨酶 1.000 0.999~1.002 >0.05
载脂蛋白A1 0.112 0.020~0.626 <0.05
冠状动脉多支病变 8.981 4.216~19.128 <0.01
Tab 2 Binary Logistic regression analysis for predicting acute myocardial infarction
Fig 1 Logistic regression-independent decision tree analysis for predicting AMI
Fig 2 ROCs of Logistic regression model and decision tree model in predicting AMI
决策树 级别* P#
*各变量在决策树中所处的节点级别(如级别1为根节点,2和3为级别递降的子节点);#基于各变量所在节点的数据拆分后比较分析得出.
决策树1 冠状动脉多支病变 1 0.031
冠心病史 2 0.045
载脂蛋白A1 3 0.020
抗血小板聚集药物史 3 0.001
决策树2 冠状动脉多支病变 1 0.035
冠心病史 2 0.027
载脂蛋白A1 3 0.004
Tab 3 Variables selected by decision tree models after splitting the dataset
模型 AUC 标准误 P 95%CI 准确度(%) 敏感度(%) 特异度(%) 约登指数
AMI:急性心肌梗死.
Logistic回归 0.826 0.032 < 0.01 0.762~0.889 86.2 75.9 79.7 0.56
决策树1 0.765 0.041 <0.01 0.684~0.846 85.4 61.8 91.2 0.53
决策树2 0.726 0.044 <0.01 0.641~0.812 85.1 52.7 92.5 0.45
Tab 4 Comparison between Logistic regression model and decision tree model in predicting AMI
变量 非冠脉多支病变(n=185) 冠脉多支病变(n=110) P
年龄(岁) 62±12. 67±12 <0.01
女性 82(44.3) 28(25.5) <0.01
高血压病史 116(62.7) 71(64.5) >0.05
糖尿病史 28(15.1) 37(33.6) <0.01
心房颤动史 14(7.6) 8(7.3) >0.05
冠心病史 55(29.7) 51(46.4) <0.01
心肌梗死史 6(3.2) 5(4.5) >0.05
脑卒中史 7(3.8) 7(6.4) >0.05
抗血小板聚集药物史 37(20.0) 23(20.9) >0.05
他汀类药物史 41(22.2) 20(18.2) >0.05
抗凝药物史 4(2.2) 1(0.9) >0.05
冠状动脉支架植入史 27(14.6) 20(18.2) >0.05
白细胞计数(×109/L) 6.3±2.0 6.7±2.6 >0.05
丙氨酸转氨酶(U/L) 19(14~29) 21(15~38) >0.05
天冬氨酸转氨酶(U/L) 22(19~29) 24.5(21~53) >0.05
肌酐(μmol/L) 82±26 82±51 >0.05
总胆固醇(mmol/L) 4.0±1.1 4.2±1.4 >0.05
三酰甘油(mmol/L) 1.4±0.9 1.6±2.2 >0.05
高密度脂蛋白(mmol/L) 1.1±0.4 1.1±0.3 >0.05
低密度脂蛋白(mmol/L) 2.2±0.8 2.3±1.0 >0.05
载脂蛋白A1(g/L) 1.2±0.2 0.8±0.2 >0.05
载脂蛋白B(g/L) 0.7±0.2 0.8±0.2 <0.05
颈动脉斑块 74(40.0) 53(48.2) >0.05
最大斑块面积(mm2) 0(0~13) 2(0~42) < 0.01
低回声斑块 41(22.2) 27(24.5) >0.05
高回声斑块 39(21.1) 23(20.9) >0.05
混合回声斑块 15(8.1) 28(25.5) < 0.01
不稳定斑块 51(27.6) 47(42.7) < 0.01
颈动脉狭窄 7(3.8) 11(10.0) < 0.05
Tab 5 Univariate analysis on predicting factors for multi-vessel coronary artery disease  [M(IQR)或n(%)或${\bar x}$±s]
变量 OR 95%CI P
颈动脉狭窄 0.858 0.236~3.124 >0.05
不稳定斑块 1.097 0.579~2.077 >0.05
最大斑块面积 1.013 1.001~1.027 <0.05
年龄 1.016 0.993~1.040 >0.05
女性 0.463 0.266~0.809 <0.01
冠心病 1.800 1.065~3.044 <0.05
糖尿病 2.795 1.544~5.060 <0.01
Tab 6 Binary Logistic regression analysis for predicting multi-vessel coronary artery disease
Fig 3 ROC of maximum plaque area of carotid artery predicting multi-vessel coronary artery disease
[1]   GAO R , PATEL A , GAO W et al. Prospective observational study of acute coronary syndromes in China:practice patterns and outcomes[J]. Heart, 2008, 94 (5): 554- 560
doi: 10.1136/hrt.2007.119750
[2]   张啸飞, 胡大一, 丁荣晶 et al. 中国心脑血管疾病死亡现况及流行趋势[J]. 中华心血管病杂志, 2012, 40 (3): 179- 187
ZHANG Xiaofei , HU Dayi , DING Rongjin et al. Status and trend of cardio-cerebral-vascular diseases mortality in China:data from national disease surveillance system between 2004 and 2008[J]. Chinese Journal of Cardiology, 2012, 40 (3): 179- 187
doi: 10.3760/cma.j.issn.0253-3758.2012.03.002
[3]   CHANG J , LIU X , SUN Y . Mortality due to acute myocardial infarction in China from 1987 to 2014:Secular trends and age-period-cohort effects[J]. Int J Cardiol, 2017, 227 229- 238
doi: 10.1016/j.ijcard.2016.11.130
[4]   陈伟伟, 高润霖, 刘力生 et al. 中国心血管病报告2013概要[J]. 中国循环杂志, 2014, 8 (7): 487- 491
CHEN Weiwei , GAO Runlin , LIU Lisheng et al. China cardiovascular diseases report 2013:A summary[J]. Chinese Circulation Journal, 2014, 8 (7): 487- 491
doi: 10.3969/j.issn.1000-3614.2014.07.003
[5]   KITAMURA A , YAMAGISHI K , IMANO H et al. Impact of hypertension and subclinical organ damage on the incidence of cardiovascular disease among Japanese residents at the population and individual levels-the circulatory risk in communities study (CIRCS)[J]. Circ J, 2017, 81 (7): 1022- 1028
doi: 10.1253/circj.CJ-16-1129
[6]   BHATIA R S , DORIAN P . Screening for cardiovascular disease risk with electrocardiography[J]. JAMA Intern Med, 2018, 178 (9): 1163- 1164
doi: 10.1001/jamainternmed.2018.2773
[7]   陈振明, 纪双斌, 史湘铃 et al. Markov决策树模型在优化15~49岁女性戊型肝炎免疫接种策略中的应用[J]. 中华流行病学杂志, 2017, 38 (2): 267- 271
CHEN Zhengmin , JI Shuangbin , SHI Xiangling et al. Use the Markov-decision tree model to optimize vaccination strategies of hepatitis E among women aged 15 to 49[J]. Chinese Journal of Epidemiology, 2017, 38 (2): 267- 271
doi: 10.3760/cma.j.issn.0254-6450.2017.02.026
[8]   LE RAY I , LEE B , WIKMAN A et al. Evaluation of a decision tree for efficient antenatal red blood cell antibody screening[J]. Epidemiology, 2018, 29 (3): 453- 457
doi: 10.1097/EDE.0000000000000805
[9]   帅健, 李丽萍, 陈业群 . 决策树模型与Logistic回归模型在伤害发生影响因素分析中的作用[J]. 中华疾病控制杂志, 2015, 19 (2): 185- 189
SHUAI Jian , LI Liping , CHEN Yequn . The role of Decision tree model and Logistic regression in injury influencing factors analysis[J]. Chinese Journal of Disease Control & Prevention, 2015, 19 (2): 185- 189
[10]   THYGESEN K , ALPERT J S , JAFFE A S et al. Fourth universal definition of myocardial infarction (2018)[J]. Eur Heart J, 2019, 40 (3): 237- 269
doi: 10.1093/eurheartj/ehy462
[11]   ROBERTS J K , RAO S V , SHAW L K et al. Comparative efficacy of coronary revascularization procedures for multivessel coronary artery disease in patients with chronic kidney disease[J]. Am J Cardiol, 2017, 119 (9): 1344- 1351
doi: 10.1016/j.amjcard.2017.01.029
[12]   XU T , ZUO P , CAO L et al. Omentin-1 is associated with carotid plaque instability among ischemic stroke patients[J]. J Atheroscler Thromb, 2018, 25 (6): 505- 511
doi: 10.5551/jat.42135
[13]   华扬, 刘蓓蓓, 凌晨 et al. 超声检查对颈动脉狭窄50%~69%和70%~99%诊断准确性的评估[J]. 中国脑血管病杂志, 2006, 3 (5): 211- 218
HUA Yang , LIU Beibei , LING Chen et al. Accurate assessment of the diagnosis between 50-69%and 70-99%carotid stenoses with ultrasono-graphy[J]. Chinese Journal of Cerebrovascular Diseases, 2006, 3 (5): 211- 218
doi: 10.3969/j.issn.1672-5921.2006.05.006
[14]   HE J , CHEN P , LUO Y et al. Relationship between the maximum carotid plaque area and the severity of coronary atherosclerosis[J]. Int Angiol, 2018, 37 (4): 300- 309
[15]   何跃, 邓唯茹, 刘司寰 . 基于组合决策树的急诊等待时间预测[J]. 统计与决策, 2016, 1 (6): 72- 74
HE Yue , DENG Weiru , LIU Sihuan . Emergency waiting time prediction based on combined decision tree[J]. Statistics and Decision, 2016, 1 (6): 72- 74
[16]   赵自强, 郑明 . 应用分类树模型筛选logistic回归中的交互因素[J]. 中国卫生统计, 2007, 24 (2): 114- 116
ZHAO Ziqiang , ZHENG Ming . Apply classification tree to automatically screen some potential interaction factors in Logistic regression[J]. Chinese Journal of Health Statistics, 2007, 24 (2): 114- 116
doi: 10.3969/j.issn.1002-3674.2007.02.001
[17]   薛允莲 . Logistic回归结合决策树技术在冠心病患者住院费用组合分析中的应用[J]. 中国卫生统计, 2015, 32 (6): 988- 989
XUE Yunlian . The application of logistic regression combined with decision tree technology in the combination analysis of hospitalization expenses of patients with coronary heart disease[J]. Chinese Journal of Health Statistics, 2015, 32 (6): 988- 989
[18]   黄晓霞, 严玉洁, 尉敏琦 et al. logistic回归、决策树和神经网络在脑卒中高危筛查中的性能比较[J]. 中国慢性病预防与控制, 2016, 24 (6): 412- 415
HUANG Xiaoxia , YAN Yujie , WEI Minqi et al. Comparison of screening group with high risk of stroke among logistic regression, decision trees and neural networks[J]. Chinese Journal of Prevention and Control of Chronic Non-Communicable Diseases, 2016, 24 (6): 412- 415
[19]   张娴静, 陈政, 赵耐青 et al. 上海市嘉定区农村居民就诊单位选择的影响因素分析——决策树和多分类无序反应变量的logistic回归相结合的方法[J]. 中国卫生统计, 2005, 22 (2): 80- 84
ZHANG Xianjing , CHEN Zheng , ZHAO Naiqing et al. Researches on the factors Influencing the outpatients' choice of selecting care providers in Jiading district of Shanghai:a method of combining decision tree model with multinomial Logistic regression[J]. Chinese Journal of Health Statistics, 2005, 22 (2): 80- 84
doi: 10.3969/j.issn.1002-3674.2005.02.005
[20]   王梦, 谢高强, 王浩 et al. 颈动脉最大斑块面积的进展速率与新发缺血性心血管事件的关系[J]. 中国循环杂志, 2014, 29 (7): 532- 536
WANG Meng , XIE Gaoqiang , WANG Hao et al. Relationship between the progression pate of corotid maximal plaque area and the risk of new ischemic cardiovascular disease[J]. Chinese Circulation Journal, 2014, 29 (7): 532- 536
doi: 10.3969/j.issn.1000-3614.2014.07.014
[1] YUAN Xuechun, XIANG Dawei, MIN Qiong, DING Yidan, ZHAO Anpeng, WANG Rong. Effects of acute hypoxia on expression of pregnane X receptor in liver tissues of rats exposed to high altitude[J]. J Zhejiang Univ (Med Sci), 2019, 48(6): 603-608.
[2] GUO Danling,HU Hongjie,ZHAO Zhenhua,LYU Sangying,HUANG Yanan,JIANG Ruhong,PU Cailing,NI Hongxia. Value of myocardial scar in predicting malignant ventricular arrhythmia in patients with chronic myocardial infarction[J]. J Zhejiang Univ (Med Sci), 2019, 48(5): 511-516.
[3] YU Zuanbiao,LIN Zuodong,LANG Dehai. Long-term efficacy of percutaneous mechanical thrombectomy combined with stent implantation in treatment of acute iliofemoral venous thrombosis[J]. J Zhejiang Univ (Med Sci), 2018, 47(6): 623-627.
[4] YIN Xiaoliang,LANG Dehai,WANG Di. Comparison of mechanical thrombectomy with transcatheter thrombolysis for acute iliac femoral venous thrombosis[J]. J Zhejiang Univ (Med Sci), 2018, 47(6): 588-594.
[5] LI Chen,ZHU Yao,YANG Jinhua,XU Dongsheng,WANG Jianbing,CHEN Kun,LI Qilong. Incidence of lung cancer in Jiashan, Zhejiang province: trend analysis from 1987 to 2016 and projection from 2017 to 2019[J]. J Zhejiang Univ (Med Sci), 2018, 47(4): 367-373.
[6] LOU Yelin,ZHOU Yimin,LU Hong,LYU Weiguo. Establishment of a prognostic model for preterm delivery in women after cervical conization[J]. J Zhejiang Univ (Med Sci), 2018, 47(4): 351-356.
[7] HE Yuxian,ZHENG Liangrong. Effect of spinal cord stimulation on myocardial ischemia/infarction[J]. J Zhejiang Univ (Med Sci), 2018, 47(2): 201-206.
[8] LI Xiaoyong,SHEN Peng,LIN Hongbo,YU Zhebin,CHEN Kun,WANG Jianbing. A community-based survey on risk factors of type 2 diabetic kidney disease in Ningbo, China[J]. J Zhejiang Univ (Med Sci), 2018, 47(2): 163-168.
[9] JIANG Xiyi,LI Lu,TANG Huijuan,CHEN Tianhui. Multiple risk factors prediction models for high risk population of colorectal cancer[J]. J Zhejiang Univ (Med Sci), 2018, 47(2): 194-200.
[10] WANG Qingsong, ZHANG Sheng, ZHANG Meixia, CHEN Zhicai, LOU Min. Collateral score based on CT perfusion can predict the prognosis of patients with anterior circulation ischemic stroke after thrombectomy[J]. J Zhejiang Univ (Med Sci), 2017, 46(4): 377-383.
[11] ZHANG Meixia, ZHOU Ying, ZHANG Ruiting, ZHANG Sheng, LOU Min. Maximal infarct volume to benefit from intravenous thrombolysis and its relation with onset to treatment time[J]. J Zhejiang Univ (Med Sci), 2017, 46(4): 384-389.
[12] YAN Ling, YE Lu, WANG Kun, ZHOU Jie, ZHU Chunjia. Atorvastatin improves reflow after percutaneous coronary intervention in patients with acute ST-segment elevation myocardial infarction by decreasing serum uric acid level[J]. J Zhejiang Univ (Med Sci), 2016, 45(5): 530-535.
[13] HE Miao, TANG Xiaojun, LONG Qian, WEI Jie, SUN Zhenxing, YANG Xuewei, TANG Shenglan. Influencing factors of diabetic patients applying for specific disease health insurance and its treatment cost[J]. J Zhejiang Univ (Med Sci), 2016, 45(3): 323-329.
[14] LANG Xiabing, YANG Yi, CHEN Jianghua. Epidemiology of acute kidney injury in hospitalized patients in China[J]. J Zhejiang Univ (Med Sci), 2016, 45(2): 208-213.
[15] SHEN Haiyan, XU Chengfu, CHEN Chunxiao. Platelet count predicts therapeutic response of infliximab for active Crohn's disease[J]. J Zhejiang Univ (Med Sci), 2016, 45(1): 81-85.