Dynamic Classification Trees for imprecise data

Aria, Massimo; Cozza, Valentina

This paper provides a supervised classification tree-based methodology to deal with Multivalued data, specifically predictors’ measurements can be provided by a functional distribution or an interval of values. Main literature refers to symbolic data analysis, aiming to extend standard methods such as factorial analysis, clustering, discriminant analysis, etc., to deal with symbolic data tables. One approach is to define a suitable data pre-processing enabling the application of standard methods. A more correct approach is to define suitable methods to deal specifically with un-standard data. In the framework of supervised classification, there are no proposal in literature for supervised classification methods to deal with both standard and multivalued data as well. There are only proposals based on data pre-processing. This paper provides a methodology to grow the so-called Dynamic CLASSification TREE (D-CLASSTREE), upon suitable definition of both a specific splitting criterion and a tree-growing algorithm. A real world case study will be considered to show the advantages of the final output and main issues of the interpretation. A comparative study with older proposals will be also described such to demonstrate the stability and the better accuracy of the D-CLASSTREE.

Dynamic Classification Trees for imprecise data / Aria, Massimo; Cozza, Valentina. - ELETTRONICO. - 1:(2012), pp. 1-8. ( 46th Scientific Meeting of Italian Statistical Society (SIS2012) Roma 20-22 giugno).