Ternary classification trees for imprecise data

Siciliano, Roberta; Aria, Massimo; Cozza, Valentina; D'Ambrosio, Antonio

The framework of this work is the statistical learning theory of Vapnik, i.e. learn from the experience (training sample) to generalize and provide useful answers (prediction, decision) in new cases. Goal is to identify the learning machine characterized by the best functional relationships between the input and the output such to approximate the supervisor’s response minimizing the loss of discrepancy or error. Classification tree-based supervisor will be considered, consisting in a recursive partitioning of the predictor space (input) to induce a partitioning of the sample of cases into disjoint subgroups which are internally homogeneous and externally heterogeneous with respect to a categorical (often dummy) response variable (output). Predictors are usually of numerical or categorical type, with punctual measurements. This paper provides a supervised classification tree-based methodology to deal with imprecise data, specifically predictors’ measurements can be provided by a functional distribution or an interval of values. The proposed recursive ternary partitioning algorithm discriminates in better way the ordering relationships and the imprecision of the case measurements. Typical data structures of this type occur in many real life applications, where training data comes with intrinsic uncertainty that might be the result of imprecise measuring instruments such as in image recognition (in medicine, physics, robotics, etc.) or human judgements/observations in socio-economic fields. As a result, the proposed approach can be understood as a “subjectivistic” view of imprecision formalizing the uncertainty concerning an underlying “crisp” phenomenon.

Ternary classification trees for imprecise data / Siciliano, R., Aria, M., Cozza, V., D'Ambrosio, A.. - (2011). (4th international conference of the ERCIM working group on Computing and Statistics (ERCIM '11) Londra 17 - 19 Dicembre 2011).