Data accumulate and there is a growing need of automated systems for partitioning data into groups, in order to describe, organize and retrieve information. Dealing with documental databases, one of the main aims is text categorization, consisting in identifying documents with similar topics. In the usual Vector space mode, thel documents are represented as points in the high dimensional space spanned by words. Obstacle to an efficient performance of algorithms is the curse of dimensionality: while dimensions increase, the space where individuals are represented becomes sparse. Classical statistical methods lack their properties and interest is devoted to find dense areas in lower dimensional spaces. Aim of the paper is reviewing the basic literature on the topic, focusing attention on dimensionality reduction and double clustering.

Beyond the curse of multidimensionality: high dimensional clustering in text mining / Balbi, Simona. - In: STATISTICA APPLICATA. - ISSN 1125-1964. - 22:1(2012), pp. 53-63.

Beyond the curse of multidimensionality: high dimensional clustering in text mining

BALBI, SIMONA
2012

Abstract

Data accumulate and there is a growing need of automated systems for partitioning data into groups, in order to describe, organize and retrieve information. Dealing with documental databases, one of the main aims is text categorization, consisting in identifying documents with similar topics. In the usual Vector space mode, thel documents are represented as points in the high dimensional space spanned by words. Obstacle to an efficient performance of algorithms is the curse of dimensionality: while dimensions increase, the space where individuals are represented becomes sparse. Classical statistical methods lack their properties and interest is devoted to find dense areas in lower dimensional spaces. Aim of the paper is reviewing the basic literature on the topic, focusing attention on dimensionality reduction and double clustering.
2012
Beyond the curse of multidimensionality: high dimensional clustering in text mining / Balbi, Simona. - In: STATISTICA APPLICATA. - ISSN 1125-1964. - 22:1(2012), pp. 53-63.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/542716
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact