Tree-based models are a popular tool for predicting a response given a set of explanatory variables when the regression function is characterized by a certain degree of complexity. Sometimes, they are also used to identify important variables and for variable selection. We show that if the generating model contains chains of direct and indirect effects, then the typical variable importance measures suggest selecting as important mainly the background variables, which have a strong indirect effect, disregarding the variables that directly influence the response. This is attributable mainly to the variable choice in the first steps of the algorithm selecting the splitting variable and to the greedy nature of such search. This pitfall could be relevant when using tree-based algorithms for understanding the underlying generating process, for population segmentation and for causal inference.

A note on the interpretation of tree-based regression models / Gottard, Anna; Vannucci, Giulia; Marchetti, Giovanni Maria. - In: BIOMETRICAL JOURNAL. - ISSN 0323-3847. - .:(2020), pp. 1-10. [10.1002/bimj.201900195]

A note on the interpretation of tree-based regression models

Vannucci, Giulia;
2020

Abstract

Tree-based models are a popular tool for predicting a response given a set of explanatory variables when the regression function is characterized by a certain degree of complexity. Sometimes, they are also used to identify important variables and for variable selection. We show that if the generating model contains chains of direct and indirect effects, then the typical variable importance measures suggest selecting as important mainly the background variables, which have a strong indirect effect, disregarding the variables that directly influence the response. This is attributable mainly to the variable choice in the first steps of the algorithm selecting the splitting variable and to the greedy nature of such search. This pitfall could be relevant when using tree-based algorithms for understanding the underlying generating process, for population segmentation and for causal inference.
2020
A note on the interpretation of tree-based regression models / Gottard, Anna; Vannucci, Giulia; Marchetti, Giovanni Maria. - In: BIOMETRICAL JOURNAL. - ISSN 0323-3847. - .:(2020), pp. 1-10. [10.1002/bimj.201900195]
File in questo prodotto:
File Dimensione Formato  
bimj.201900195.pdf

non disponibili

Licenza: Non specificato
Dimensione 882.4 kB
Formato Adobe PDF
882.4 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/954605
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 14
social impact