Acta Informatica Pragensia 2022, 11(1), 36-47 | DOI: 10.18267/j.aip.1632937

Predicting Mortality in Patients with Stroke Using Data Mining Techniques

Zahra Hadianfard1, Hadi Lotfnezhad Afshar ORCID...1, Surena Nazarbaghi ORCID...2, Bahlol Rahimi ORCID...1, Toomas Timpka ORCID...3
1 Department of Health Information Technology, School of Allied Medical Sciences, Urmia University of Medical Sciences, Urmia, Iran
2 Department of Neurology, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
3 Department of Health, Medicine, and Caring Sciences, Linköping University, Linköping, Sweden

The mortality due to stroke is increasing. Accurate prediction of stroke-caused death is very important for healthcare. Data mining methods are novel ways to predict these mortality risks. The aim of this study is to employ popular data mining algorithms to predict the survival of stroke patients and extract decision rules. The data on stroke patients (n=4149) were collected from paper medical records. Missing data were managed using the multiple imputation method. Also, the target variable was balanced using methods such as over-sampling, under-sampling and Synthetic Minority Oversampling (SMOTE). The support vector machine (SVM), decision tree, and logistic regression (LR) algorithms were employed to predict the survival of stroke patients. Also, the Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm was used to extract the decision rules from the main dataset. LR outperformed other algorithms in terms of accuracy (76.96%), sensitivity (79.06%) and kappa (33.34). However, specificity (65.35%) and AUC (0.77) scores were lower than those of other algorithms. An independent dataset with 234 records was selected to challenge the LR algorithm with the best performance from the main dataset. After employing this algorithm on the external validation dataset, its performance was improved in accuracy (79.91%), sensitivity (83.94%), kappa (39.26) and AUC (0.8), but not in specificity (60.98%). The constructed model predicted the survival of stroke patients with high scores and useful rules were extracted for clinical usage.

Keywords: Data mining; Decision trees; Stroke; Survival; Logistic regression; Iran.

Received: September 20, 2021; Revised: November 17, 2021; Accepted: November 18, 2021; Prepublished online: November 21, 2021; Published: March 13, 2022  Show citation

ACS AIP APA ASA Harvard Chicago Chicago Notes IEEE ISO690 MLA NLM Turabian Vancouver
Hadianfard, Z., Lotfnezhad Afshar, H., Nazarbaghi, S., Rahimi, B., & Timpka, T. (2022). Predicting Mortality in Patients with Stroke Using Data Mining Techniques. Acta Informatica Pragensia11(1), 36-47. doi: 10.18267/j.aip.163
Download citation

References

  1. Arslan, A. K., Colak, C., & Sarihan, M. E. (2016). Different medical data mining approaches based prediction of ischemic stroke. Computer Methods and Programs in Biomedicine, 130, 87-92. https://doi.org/10.1016/j.cmpb.2016.03.022 Go to original source...
  2. Azarpazhooh, M. R., Etemadi, M. M., Donnan, G. A., Mokhber, N., Majdi, M. R., Ghayour-Mobarhan, M., Ghandehary, K., Farzadfard, M. T., Kiani, R., Panahandeh, M., & Thrift, A. G. (2010). Excessive Incidence of Stroke in Iran. Stroke, 41(1). https://doi.org/10.1161/strokeaha.109.559708 Go to original source...
  3. Boehme, A. K., Esenwa, C., & Elkind, M. S. V. (2017). Stroke Risk Factors, Genetics, and Prevention. Circulation Research, 120(3), 472-495. https://doi.org/10.1161/CIRCRESAHA.116.308398 Go to original source...
  4. Çelik, G., Baykan, Ö. K., Kara, Y., & Tireli, H. (2014). Predicting 10-day Mortality in Patients with Strokes Using Neural Networks and Multivariate Statistical Methods. Journal of Stroke and Cerebrovascular Diseases, 23(6), 1506-1512. https://doi.org/10.1016/j.jstrokecerebrovasdis.2013.12.018 Go to original source...
  5. Cheon, S., Kim, J., & Lim, J. (2019). The Use of Deep Learning to Predict Stroke Patient Mortality. International Journal of Environmental Research and Public Health, 16(11). https://doi.org/10.3390/ijerph16111876 Go to original source...
  6. Counsell, C., Dennis, M., McDowall, M., & Warlow, C. (2002). Predicting Outcome After Acute and Subacute Stroke. Stroke, 33(4), 1041-1047. https://doi.org/10.1161/hs0402.105909 Go to original source...
  7. Counsel, C., Dennis, M. S., Lewis, S., Warlow, C., FOOD Trial Collaboration. Feed Or Ordinary Diet. (2003). Performance of a Statistical Model to Predict Stroke Outcome in the Context of a Large, Simple, Randomized, Controlled Trial of Feeding. Stroke, 34(1), 127-133. https://doi.org/10.1161/01.str.0000044165.41303.50 Go to original source...
  8. de Toledo, P., Rios, P. M., Ledezma, A., Sanchis, A., Alen, J. F., & Lagares, A. (2009). Predicting the Outcome of Patients With Subarachnoid Hemorrhage Using Machine Learning Techniques. IEEE Transactions on Information Technology in Biomedicine, 13(5), 794-801. https://doi.org/10.1109/titb.2009.2020434 Go to original source...
  9. Feigin, V. L., Roth, G. A., Naghavi, M., Parmar, P., Krishnamurthi, R., Chugh, S., Mensah, G. A., Norrving, B., Shiue, I., Ng, M., Estep, K., Cercy, K., Murray, C. J. L., & Forouzanfar, M. H. (2016). Global burden of stroke and risk factors in 188 countries, during 1990-2013: a systematic analysis for the Global Burden of Disease Study 2013. The Lancet Neurology, 15(9), 913-924. https://doi.org/10.1016/s1474-4422(16)30073-4 Go to original source...
  10. Fernandez, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research, 61, 863-905. https://doi.org/10.1613/jair.1.11192 Go to original source...
  11. Gebreyohannes, E. A., Bhagavathula, A. S., Abebe, T. B., Seid, M. A., & Haile, K. T. (2019). In-Hospital Mortality among Ischemic Stroke Patients in Gondar University Hospital: A Retrospective Cohort Study. Stroke Research and Treatment, 2019, 1-7. https://doi.org/10.1155/2019/7275063 Go to original source...
  12. Ghandehari, K. (2016). Epidemiology of Stroke in Iran. Galen Medical Journal, 5(supplement 1), 3-9. Go to original source...
  13. Gu, Q., Cai, Z., Zhu, L., & Huang, B. (2008). Data Mining on Imbalanced Data Sets. In 2008 International Conference on Advanced Computer Theory and Engineering (pp. 1020-1024). IEEE. https://doi.org/10.1109/ICACTE.2008.26 Go to original source...
  14. Hankey, G. J., Jamrozik, K., Broadhurst, R. J., Forbes, S., Burvill, P. W., Anderson, C. S., & Stewart-Wynne, E. G. (2000). Five-Year Survival After First-Ever Stroke and Related Prognostic Factors in the Perth Community Stroke Study. Stroke, 31(9), 2080-2086. https://doi.org/10.1161/01.str.31.9.2080 Go to original source...
  15. Ho, KC., Speier, W., El-Saden, S., Liebeskind, DS., Saver, JL., Bui, AA., Arnold., CW. (2014). Predicting discharge mortality after acute ischemic stroke using balanced data. In AMIA Annual Symposium proceedings, 2014 (pp. 1787-1796). AMIA.
  16. Horton, N. J., & Kleinman, K. P. (2007). Much Ado About Nothing. The American Statistician, 61(1), 79-90. https://doi.org/10.1198/000313007x172556 Go to original source...
  17. Hosseini, A. A., Sobhani-Rad, D., Ghandehari, K., & Benamer, H. T. (2010). Frequency and clinical patterns of stroke in Iran - Systematic and critical review. BMC Neurology, 10(1). https://doi.org/10.1186/1471-2377-10-72 Go to original source...
  18. Jeena, R. S., & Kumar, S. (2016). Stroke prediction using SVM. In 2016 International Conference on Control, Instrumentation, Communication and Computational Technologies (pp. 600-602). IEEE. https://doi.org/10.1109/ICCICCT.2016.7988020 Go to original source...
  19. Johnston, K. C., Connors, A. F., Wagner, D. P., Knaus, W. A., Wang, X.-Q., & Haley, E. C. (2000). A Predictive Risk Model for Outcomes of Ischemic Stroke. Stroke, 31(2), 448-455. https://doi.org/10.1161/01.str.31.2.448 Go to original source...
  20. KönigI. R., Ziegler, A., Bluhmki, E., Hacke, W., Bath, P. M. W., Sacco, R. L., Diener, H. C., & Weimar, C. (2008). Predicting Long-Term Outcome After Acute Ischemic Stroke. Stroke, 39(6), 1821-1826. https://doi.org/10.1161/strokeaha.107.505867 Go to original source...
  21. Lewis, S. C., Sandercock, P. A., & Dennis, M. S. (2008). Predicting outcome in hyper-acute stroke: validation of a prognostic model in the Third International Stroke Trial (IST3). Journal of Neurology, Neurosurgery & Psychiatry, 79(4), 397-400. https://doi.org/10.1136/jnnp.2007.126045 Go to original source...
  22. Li, W.-J., Gao, Z.-Y., He, Y., Liu, G.-Z., & Gao, X.-G. (2011). Application and Performance of Two Stroke Outcome Prediction Models in a Chinese Population. PM&R, 4(2), 123-128. https://doi.org/10.1016/j.pmrj.2011.08.669 Go to original source...
  23. Lima, F. O., Silva, G. S., Furie, K. L., Frankel, M. R., Lev, M. H., Camargo, É. C. S., Haussen, D. C., Singhal, A. B., Koroshetz, W. J., Smith, W. S., & Nogueira, R. G. (2016). Field Assessment Stroke Triage for Emergency Destination. Stroke, 47(8), 1997-2002. https://doi.org/10.1161/strokeaha.116.013301 Go to original source...
  24. Lindsay, PM., Norrving, B., Sacco, R.L., Brainin, M., Hacke, W., Martins, S.H., Pandian, J., & Feigin, V. (2019). World Stroke Organization (WSO): Global Stroke Fact Sheet 2019. Retrieved November 15, 2021, from https://www.world-stroke.org Go to original source...
  25. Nam, H. S., Kim, H. C., Kim, Y. D., Lee, H. S., Kim, J., Lee, D. H., & Heo, J. H. (2012). Long-Term Mortality in Patients With Stroke of Undetermined Etiology. Stroke, 43(11), 2948-2956. https://doi.org/10.1161/strokeaha.112.661074 Go to original source...
  26. Ni, Y., Alwell, K., Moomaw, C. J., Woo, D., Adeoye, O., Flaherty, M. L., Ferioli, S., Mackey, J., De Los Rios La Rosa, F., Martini, S., Khatri, P., Kleindorfer, D., & Kissela, B. M. (2018). Towards phenotyping stroke: Leveraging data from a large-scale epidemiological study to detect stroke diagnosis. PLOS ONE, 13(2), e0192586. https://doi.org/10.1371/journal.pone.0192586 Go to original source...
  27. Peng, S.-Y. ., Chuang, Y.-C. ., Kang, T.-W. ., & Tseng, K.-H. (2010). Random forest can predict 30-day mortality of spontaneous intracerebral hemorrhage with remarkable discrimination. European Journal of Neurology, 17(7), 945-950. https://doi.org/10.1111/j.1468-1331.2010.02955.x Go to original source...
  28. Salman, R. R., Delbari, A., & Tabatabae, S. S. (2012). Stroke rehabilitation: principles, advances, early experiences, and realities in Iran. Journal of Sabzevar University of Medical Sciences, 19(2), 96-108.
  29. Smith, E. E., Shobha, N., Dai, D., Olson, D. M., Reeves, M. J., Saver, J. L., Hernandez, A. F., Peterson, E. D., Fonarow, G. C., & Schwamm, L. H. (2013). A Risk Score for In-Hospital Death in Patients Admitted With Ischemic or Hemorrhagic Stroke. Journal of the American Heart Association, 2(1). https://doi.org/10.1161/jaha.112.005207 Go to original source...
  30. Wei, Q., & Dunbrack, R. L. (2013). The Role of Balanced Training and Testing Data Sets for Binary Classifiers in Bioinformatics. PLoS ONE, 8(7), e67863. https://doi.org/10.1371/journal.pone.0067863 Go to original source...
  31. Weimar, C., Ziegler, A., König, I. R., & Diener, H.-C. (2002). Predicting functional outcome and survival after acute ischemic stroke. Journal of Neurology, 249(7), 888-895. https://doi.org/10.1007/s00415-002-0755-8 Go to original source...
  32. Wijaya, HR., Supriyanto, E., Salim, MIM., Siregar, KN., & Eryando, T. (2019). Stroke management cost: Review in Indonesia, Malaysia and Singapore. AIP conference proceedings, 2092(1), 030022. Go to original source...
  33. Xian, Y., Holloway, R.G., Chan, P.S., Noyes, K., Shah, M.N., Ting, H.H., Chappel, A.R., Peterson, E.D., & Friedman, B. (2011). Association Between Stroke Center Hospitalization for Acute Ischemic Stroke and Mortality. The Journal of the American Medical Association, 305(4), 373-380. https://doi.org/10.1001/jama.2011.22 Go to original source...

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.