In this paper, multivalued data or multiple values variables are defined. They are typical when there is some intrinsic uncertainty in data production, as the result of imprecise measuring instruments, such as in image recognition, in human judgments and so on. So far, contributions in symbolic data analysis literature provide data preprocessing criteria allowing for the use of standard methods such as factorial analysis, clustering, discriminant analysis, tree-based methods. As an alternative, this paper introduces a methodology for supervised classification, the so-called Dynamic CLASSification TREE (D-CLASS TREE), dealing simultaneously with both standard and multivalued data as well. For that, an innovative partitioning criterion with a tree-growing algorithm will be defined. Main result is a dynamic tree structure characterized by the simultaneous presence of binary and ternary partitions. A real world case study will be considered to show the advantages of the proposed methodology and main issues of the interpretation of the final results. A comparative study with other approaches dealing with the same types of data will be also shown. The comparison highlights that, even if the results are quite similar in terms of error rates, the proposed D-CLASS tree returns a more interpretable tree-based structure.
Dynamic recursive tree-based partitioning for malignant melanoma identification in skin lesion dermoscopic images / Aria, Massimo; D'Ambrosio, Antonio; Iorio, Carmela; Siciliano, Roberta; Cozza, Valentina. - In: STATISTICAL PAPERS. - ISSN 0932-5026. - 61:(2020), pp. 1645-1661. [10.1007/s00362-018-0997-x]
Dynamic recursive tree-based partitioning for malignant melanoma identification in skin lesion dermoscopic images
Massimo Aria;Antonio D'Ambrosio
;Carmela Iorio;Roberta Siciliano;
2020
Abstract
In this paper, multivalued data or multiple values variables are defined. They are typical when there is some intrinsic uncertainty in data production, as the result of imprecise measuring instruments, such as in image recognition, in human judgments and so on. So far, contributions in symbolic data analysis literature provide data preprocessing criteria allowing for the use of standard methods such as factorial analysis, clustering, discriminant analysis, tree-based methods. As an alternative, this paper introduces a methodology for supervised classification, the so-called Dynamic CLASSification TREE (D-CLASS TREE), dealing simultaneously with both standard and multivalued data as well. For that, an innovative partitioning criterion with a tree-growing algorithm will be defined. Main result is a dynamic tree structure characterized by the simultaneous presence of binary and ternary partitions. A real world case study will be considered to show the advantages of the proposed methodology and main issues of the interpretation of the final results. A comparative study with other approaches dealing with the same types of data will be also shown. The comparison highlights that, even if the results are quite similar in terms of error rates, the proposed D-CLASS tree returns a more interpretable tree-based structure.File | Dimensione | Formato | |
---|---|---|---|
StatPapers.pdf
solo utenti autorizzati
Tipologia:
Versione Editoriale (PDF)
Licenza:
Copyright dell'editore
Dimensione
633.19 kB
Formato
Adobe PDF
|
633.19 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.