This paper provides a supervised classification tree-based methodology to deal with Multivalued data, specifically predictors’ measurements can be provided by a functional distribution or an interval of values. Main literature refers to symbolic data analysis, aiming to extend standard methods such as factorial analysis, clustering, discriminant analysis, etc., to deal with symbolic data tables. One approach is to define a suitable data pre-processing enabling the application of standard methods. A more correct approach is to define suitable methods to deal specifically with un-standard data. In the framework of supervised classification, there are no proposal in literature for supervised classification methods to deal with both standard and multivalued data as well. There are only proposals based on data pre-processing. This paper provides a methodology to grow the so-called Dynamic CLASSification TREE (D-CLASSTREE), upon suitable definition of both a specific splitting criterion and a tree-growing algorithm. A real world case study will be considered to show the advantages of the final output and main issues of the interpretation. A comparative study with older proposals will be also described such to demonstrate the stability and the better accuracy of the D-CLASSTREE.
Dynamic Classification Trees for imprecise data / Aria, Massimo; Cozza, Valentina. - ELETTRONICO. - 1:(2012), pp. 1-8. (Intervento presentato al convegno 46th Scientific Meeting of Italian Statistical Society (SIS2012) tenutosi a Roma nel 20-22 giugno).
Dynamic Classification Trees for imprecise data
ARIA, MASSIMO;COZZA, VALENTINA
2012
Abstract
This paper provides a supervised classification tree-based methodology to deal with Multivalued data, specifically predictors’ measurements can be provided by a functional distribution or an interval of values. Main literature refers to symbolic data analysis, aiming to extend standard methods such as factorial analysis, clustering, discriminant analysis, etc., to deal with symbolic data tables. One approach is to define a suitable data pre-processing enabling the application of standard methods. A more correct approach is to define suitable methods to deal specifically with un-standard data. In the framework of supervised classification, there are no proposal in literature for supervised classification methods to deal with both standard and multivalued data as well. There are only proposals based on data pre-processing. This paper provides a methodology to grow the so-called Dynamic CLASSification TREE (D-CLASSTREE), upon suitable definition of both a specific splitting criterion and a tree-growing algorithm. A real world case study will be considered to show the advantages of the final output and main issues of the interpretation. A comparative study with older proposals will be also described such to demonstrate the stability and the better accuracy of the D-CLASSTREE.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.