A key element in the success of data analysis is the strong contribu- tion of visualization: dendrograms and factorial plans are intuitive ways to display association relationships within and among sets of variables and groups of units. In the Association Rules (AR) mining we refer to a n × p data matrix, where n indicates the number of statistical units and p the number of attributes, which are also called items. The problem consists in analyzing links between attributes. Sets of attributes that co-occur through the whole data matrix are referred as patterns. Scanning the whole data set and analyzing all the relationships is an interesting and promising approach, yet this approach leads to a NP-hard problem and cannot get any solution when dealing with large number of attributes. Moreover, in some cases, the most interesting relationships refer to subpopulations in the data that are hidden by the obvious ones and cannot be identified by the classical descriptive and inferential statistical methods. Jointly using the factorial and clustering methods in a unitary exploratory approach copes with these issues. The analyst can identify the most interesting groups of units and sets of attributes and more easily identifies interesting patterns in large and huge binary data base, focusing the attention only on them.

Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data Bases / Palumbo, Francesco. - (2008). (Intervento presentato al convegno 31st Gesellschaft für Klassifikation tenutosi a Amburgo nel 16-18 Luglio 2008).

Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data Bases

PALUMBO, FRANCESCO
2008

Abstract

A key element in the success of data analysis is the strong contribu- tion of visualization: dendrograms and factorial plans are intuitive ways to display association relationships within and among sets of variables and groups of units. In the Association Rules (AR) mining we refer to a n × p data matrix, where n indicates the number of statistical units and p the number of attributes, which are also called items. The problem consists in analyzing links between attributes. Sets of attributes that co-occur through the whole data matrix are referred as patterns. Scanning the whole data set and analyzing all the relationships is an interesting and promising approach, yet this approach leads to a NP-hard problem and cannot get any solution when dealing with large number of attributes. Moreover, in some cases, the most interesting relationships refer to subpopulations in the data that are hidden by the obvious ones and cannot be identified by the classical descriptive and inferential statistical methods. Jointly using the factorial and clustering methods in a unitary exploratory approach copes with these issues. The analyst can identify the most interesting groups of units and sets of attributes and more easily identifies interesting patterns in large and huge binary data base, focusing the attention only on them.
2008
Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data Bases / Palumbo, Francesco. - (2008). (Intervento presentato al convegno 31st Gesellschaft für Klassifikation tenutosi a Amburgo nel 16-18 Luglio 2008).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/381111
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact