The ambiguity of words is a crucial question when dealing with an automatic analysis of documentary data bases. In Text Mining, Word Sense Disambiguation is the task of giving a particular sense to a term with different meanings both in the case of coincidental and polysemous homographs. In literature, the proposed solutions are mainly based on two elements: some knowledge related to the term and the context in which the term appears. Limiting the related knowledge to grammatical tagging, and to an analysis of collocations, here we focus our attention on identifying the context, in a data driven approach. Our framework is based on Textual Data Analysis and we assume that language and knowledge can be modeled as networks of words and the relations between them. The aim of this paper is to propose an extension of the strategy for building lexical sources in Balbi et al. (2012), in order to deal with ambiguous words. The methodological basis is given by the joint use of lexical Correspondence Analysis and Network Analysis. Our idea is investigating the neighborhood of ambiguous terms, with respect to the different latent semantic components, emerging thanks to Correspondence Analysis of a training set of documents, in order to build some rules, useful in solving WSD problems in the entire corpus.

Textual Data Analysis tools for Word Sense Disambiguation / Balbi, Simona; Stawinoga, AGNIESZKA ELZBIETA. - 1:(2014), pp. 57-66. (Intervento presentato al convegno JADT 2014 tenutosi a Parigi nel 3-6 giugno 2014).

Textual Data Analysis tools for Word Sense Disambiguation

BALBI, SIMONA;STAWINOGA, AGNIESZKA ELZBIETA
2014

Abstract

The ambiguity of words is a crucial question when dealing with an automatic analysis of documentary data bases. In Text Mining, Word Sense Disambiguation is the task of giving a particular sense to a term with different meanings both in the case of coincidental and polysemous homographs. In literature, the proposed solutions are mainly based on two elements: some knowledge related to the term and the context in which the term appears. Limiting the related knowledge to grammatical tagging, and to an analysis of collocations, here we focus our attention on identifying the context, in a data driven approach. Our framework is based on Textual Data Analysis and we assume that language and knowledge can be modeled as networks of words and the relations between them. The aim of this paper is to propose an extension of the strategy for building lexical sources in Balbi et al. (2012), in order to deal with ambiguous words. The methodological basis is given by the joint use of lexical Correspondence Analysis and Network Analysis. Our idea is investigating the neighborhood of ambiguous terms, with respect to the different latent semantic components, emerging thanks to Correspondence Analysis of a training set of documents, in order to build some rules, useful in solving WSD problems in the entire corpus.
2014
9782954778112
Textual Data Analysis tools for Word Sense Disambiguation / Balbi, Simona; Stawinoga, AGNIESZKA ELZBIETA. - 1:(2014), pp. 57-66. (Intervento presentato al convegno JADT 2014 tenutosi a Parigi nel 3-6 giugno 2014).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/592689
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact