Assuming that language can be modelled as a network of words, it is difficult to mine knowledge in textual data bases, due to their high dimensionality and the ambiguity which characterises words and their use. From a methodological viewpoint, here we propose a strategy for stressing the differences in the manifest relations emerging by Network Analysis (NA) and the latent relations obtained by lexical Correspondence Analysis (CA). Aim of this paper is to deal with the word-sense disambiguation problem, not in the usual pre-processing step, but during the analysis. The results applied to the analysis of a management commentary are presented in order to propose some statistical lexical sources, useful in the peculiar domain of business information.
Mining the Ambiguity: Correspondence and network analysis for discovering word sense / Balbi, Simona; Stawinoga, AGNIESZKA ELZBIETA. - (2013). (Intervento presentato al convegno Conference SIS 2013: "Advances in Latent Variables, Methods, Models and Applications tenutosi a Brescia, Italia nel 19-21 giugno 2013).
Mining the Ambiguity: Correspondence and network analysis for discovering word sense
BALBI, SIMONA;STAWINOGA, AGNIESZKA ELZBIETA
2013
Abstract
Assuming that language can be modelled as a network of words, it is difficult to mine knowledge in textual data bases, due to their high dimensionality and the ambiguity which characterises words and their use. From a methodological viewpoint, here we propose a strategy for stressing the differences in the manifest relations emerging by Network Analysis (NA) and the latent relations obtained by lexical Correspondence Analysis (CA). Aim of this paper is to deal with the word-sense disambiguation problem, not in the usual pre-processing step, but during the analysis. The results applied to the analysis of a management commentary are presented in order to propose some statistical lexical sources, useful in the peculiar domain of business information.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.