The output of hierarchical clustering methods is typically displayed as a dendrogram describing a family of nested partitions. However, the typical approach, horizontally cutting the dendrogram at a given distance level, explores only a restricted subset of the whole set of partitions. We proposed an algorithm, DESPOTA - DEndrogram Slicing through a PermutatiOn Test Approach (Bruzzese and Vistocco, 2015), exploiting the methodological framework of permutation tests (Pesarin and Salmaso, 2010), that permits a partition to be automatically found where clusters do not necessarily obey the above principle. DESPOTA offers a validated partition to the final user and it adapts to every choice of the distance metric and agglomeration criterion used to grow the tree. The algorithm retraces the tree downward, starting from the root of the dendrogram, where all objects are classified in a unique cluster, and moving down a partial threshold until a link joining two clusters is encountered. A permutation test is then performed in order to verify whether the two clusters should be considered a single group (the null hypothesis) or not (the alternative one). If the Null cannot be rejected, the corresponding branch will become an element of the final partition and none of its sub-branches will be processed any longer. Otherwise each of them will be further visited in the course of the procedure. DESPOTA is shown in action both on real and synthetic datasets through a comparison with competitive methods (Gurrutxaga, 2010), (Milligan, 1981) (Tibshirani, 2001). The results obtained both on synthetic and real datasets show that DESPOTA performs well in situations characterized by different data and cluster structures.

DESPOTA: an algorithm to automatically detect a reliable partition on a dendrogram / Bruzzese, Dario; Passaretti, Davide; Vistocco, Domenico. - (2015). (Intervento presentato al convegno CARME 2015 - Correspondence Analysis and Related Methods tenutosi a Naples, Italy nel September, 20-23).

DESPOTA: an algorithm to automatically detect a reliable partition on a dendrogram

Bruzzese Dario;Vistocco Domenico
2015

Abstract

The output of hierarchical clustering methods is typically displayed as a dendrogram describing a family of nested partitions. However, the typical approach, horizontally cutting the dendrogram at a given distance level, explores only a restricted subset of the whole set of partitions. We proposed an algorithm, DESPOTA - DEndrogram Slicing through a PermutatiOn Test Approach (Bruzzese and Vistocco, 2015), exploiting the methodological framework of permutation tests (Pesarin and Salmaso, 2010), that permits a partition to be automatically found where clusters do not necessarily obey the above principle. DESPOTA offers a validated partition to the final user and it adapts to every choice of the distance metric and agglomeration criterion used to grow the tree. The algorithm retraces the tree downward, starting from the root of the dendrogram, where all objects are classified in a unique cluster, and moving down a partial threshold until a link joining two clusters is encountered. A permutation test is then performed in order to verify whether the two clusters should be considered a single group (the null hypothesis) or not (the alternative one). If the Null cannot be rejected, the corresponding branch will become an element of the final partition and none of its sub-branches will be processed any longer. Otherwise each of them will be further visited in the course of the procedure. DESPOTA is shown in action both on real and synthetic datasets through a comparison with competitive methods (Gurrutxaga, 2010), (Milligan, 1981) (Tibshirani, 2001). The results obtained both on synthetic and real datasets show that DESPOTA performs well in situations characterized by different data and cluster structures.
2015
DESPOTA: an algorithm to automatically detect a reliable partition on a dendrogram / Bruzzese, Dario; Passaretti, Davide; Vistocco, Domenico. - (2015). (Intervento presentato al convegno CARME 2015 - Correspondence Analysis and Related Methods tenutosi a Naples, Italy nel September, 20-23).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/744328
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact