Contained in:
Book Chapter

Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests

  • Massimo Aria
  • Corrado Cuccurullo
  • Agostino Gnasso

The growing success of Machine Learning (ML) is making significant improvements to predictive models, facilitating their integration in various application fields, especially the healthcare context. However, it still has limitations and drawbacks, such as the lack of interpretability which does not allow users to understand how certain decisions are made. This drawback is identified with the term "Black-Box", as well as models that do not allow to interpret the internal work of certain ML techniques, thus discouraging their use. In a highly regulated and risk-averse context such as healthcare, although "trust" is not synonymous with decision and adoption, trusting an ML model is essential for its adoption. Many clinicians and health researchers feel uncomfortable with black box ML models, even if they achieve high degrees of diagnostic or prognostic accuracy. Therefore more and more research is being conducted on the functioning of these models. Our study focuses on the Random Forest (RF) model. It is one of the most performing and used methodologies in the context of ML approaches, in all fields of research from hard sciences to humanities. In the health context and in the evaluation of health policies, their use is limited by the impossibility of obtaining an interpretation of the causal links between predictors and response. This explains why we need to develop new techniques, tools, and approaches for reconstructing the causal relationships and interactions between predictors and response used in a RF model. Our research aims to perform a machine learning experiment on several medical datasets through a comparison between two methodologies, which are inTrees and NodeHarvest. They are the main approaches in the rules extraction framework. The contribution of our study is to identify, among the approaches to rule extraction, the best proposal for suggesting the appropriate choice to decision-makers in the health domain.

  • Keywords:
  • Random Forest,
  • Model Interpretation,
  • Health domain,
  • Rule Extraction,
+ Show More

Massimo Aria

University of Naples Federico II, Italy - ORCID: 0000-0002-8517-9411

Corrado Cuccurullo

University of Campania Luigi Vanvitelli, Italy - ORCID: 0000-0002-7401-8575

Agostino Gnasso

University of Naples Federico II, Italy - ORCID: 0000-0002-9220-9754

  1. Adadi, A. and Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (xai). IEEE Access, 6.
  2. Ahmad, M. A., Eckert, C., and Teredesai, A. (2018). Interpretable machine learning in healthcare. In Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, pp. 559–560.
  3. Akosa, J. (2017). Predictive accuracy: A misleading performance measure for highly imbalanced data. In Proceedings of the SAS Global Forum, pp. 2–5.
  4. Aria, M., Cuccurullo, C., and Gnasso, A. (2021). A comparison among interpretative proposals for random forests. Machine Learning with Applications.
  5. Aria, M., D’Ambrosio, A., Iorio, C., Siciliano, R., and Cozza, V. (2020). Dynamic recursive tree-based partitioning for malignant melanoma identification in skin lesion dermoscopic images. Statistical papers, 61(4).
  6. Breiman, L. (1996). Bagging predictors. Machine learning, 24(2):pp. 123–140.
  7. Breiman, L. (2001). Random forests. Machine learning, 45(1):pp. 5–32.
  8. Breiman, L. et al. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3):pp. 199–231.
  9. D’Ambrosio, A., Aria, M., and Siciliano, R. (2012). Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm. Journal of classification, 29(2):pp. 227–258.
  10. Deng, H. (2019). Interpreting tree ensembles with intrees. International Journal of Data Science and Analytics, 7(4):pp. 277–287.
  11. Dhillon, A. and Singh, A. (2019). Machine learning in healthcare data analysis: a survey. Journal of Biology and Today’s World, 8(6):pp. 1–10.
  12. Díaz-Uriarte, R. and De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC bioinformatics, 7(1):pp. 3.
  13. Domingos, P. (1998). Occam’s two razors: the sharp and the blunt. In KDD, pp. 37–43.
  14. Domingos, P. (1999). The role of occam’s razor in knowledge discovery. Data mining and knowledge discovery, 3(4):pp. 409–425.
  15. Du, M., Liu, N., and Hu, X. (2019). Techniques for interpretable machine learning. Communications of the ACM, 63(1):pp. 68–77.
  16. García, V., Mollineda, R. A., and S´anchez, J. S. (2009). Index of balanced accuracy: A performance measure for skewed class distributions. In Iberian conference on pattern recognition and image analysis, pp. 441–448. Springer.
  17. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5):pp. 1–42.
  18. Haddouchi, M. and Berrado, A. (2019). A survey of methods and tools used for interpreting random forest. In 2019 1st International Conference on Smart Systems and Data Science (ICSSD), pp. 1–6. IEEE.
  19. Meinshausen, N. (2010). Node harvest. The Annals of Applied Statistics, pp. 2049–2072.
  20. Miotto, R., Wang, F., Wang, S., Jiang, X., and Dudley, J. T. (2018). Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics, 19(6).
  21. Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144.
  22. Sokolova, M., Japkowicz, N., and Szpakowicz, S. (2006). Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence, pp. 1015–1021. Springer.
PDF
  • Publication Year: 2021
  • Pages: 179-184
  • Content License: CC BY 4.0
  • © 2021 Author(s)

XML
  • Publication Year: 2021
  • Content License: CC BY 4.0
  • © 2021 Author(s)

Chapter Information

Chapter Title

Supporting decision-makers in healthcare domain. A comparative study of two interpretative proposals for Random Forests

Authors

Massimo Aria, Corrado Cuccurullo, Agostino Gnasso

Language

English

DOI

10.36253/978-88-5518-461-8.34

Peer Reviewed

Publication Year

2021

Copyright Information

© 2021 Author(s)

Content License

CC BY 4.0

Metadata License

CC0 1.0

Bibliographic Information

Book Title

ASA 2021 Statistics and Information Systems for Policy Evaluation

Book Subtitle

BOOK OF SHORT PAPERS of the on-site conference

Editors

Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci

Peer Reviewed

Publication Year

2021

Copyright Information

© 2021 Author(s)

Content License

CC BY 4.0

Metadata License

CC0 1.0

Publisher Name

Firenze University Press

DOI

10.36253/978-88-5518-461-8

eISBN (pdf)

978-88-5518-461-8

eISBN (xml)

978-88-5518-462-5

Series Title

Proceedings e report

Series ISSN

2704-601X

Series E-ISSN

2704-5846

146

Fulltext
downloads

192

Views

Export Citation

1,297

Open Access Books

in the Catalogue

1,746

Book Chapters

3,070,547

Fulltext
downloads

3,973

Authors

from 817 Research Institutions

of 63 Nations

61

scientific boards

from 334 Research Institutions

of 42 Nations

1,139

Referees

from 343 Research Institutions

of 36 Nations