In this paper, we present a framework supporting information retrieval over corpora of documents using an automatic sematic query expansion approach. The main idea is to expand the set of words used as query terms exploiting the notion of semantic similarity between the concepts related to the search terms. We leverage existing lexical resources and similarity metrics computed among terms to generate - by a proper mapping into a vectorial space - an index for the fast retrieval of a set of terms "semantically correlated" to a given query term. The vector of expanded terms is then exploited in the query stage to retrieve documents that are significantly related to specific combinations of the query terms. Preliminary experimental results concerning efficiency and effectiveness of the proposed approach are reported and discussed.
A novel approach to query expansion based on semantic similarity measures / Amato, Flora; De Santo, Aniello; Gargiulo, Francesco; Moscato, Vincenzo; Persia, Fabio; Picariello, Antonio; Sperli', Giancarlo. - (2015), pp. 344-353. (Intervento presentato al convegno 4th International Conference on Data Management Technologies and Applications, DATA 2015 tenutosi a Colmar (Alsace, France) nel July 20-22, 2015).
A novel approach to query expansion based on semantic similarity measures
AMATO, FLORA;MOSCATO, VINCENZO;PICARIELLO, ANTONIO;SPERLI', GIANCARLO
2015
Abstract
In this paper, we present a framework supporting information retrieval over corpora of documents using an automatic sematic query expansion approach. The main idea is to expand the set of words used as query terms exploiting the notion of semantic similarity between the concepts related to the search terms. We leverage existing lexical resources and similarity metrics computed among terms to generate - by a proper mapping into a vectorial space - an index for the fast retrieval of a set of terms "semantically correlated" to a given query term. The vector of expanded terms is then exploited in the query stage to retrieve documents that are significantly related to specific combinations of the query terms. Preliminary experimental results concerning efficiency and effectiveness of the proposed approach are reported and discussed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.