Analysing text to detect semantic similarities is a recent breakthrough of Natural Language Processing that brought many novel applications in different fields. A domain which could greatly benefit of this innovation is the one regarding Location-based and/or Touristic Recommender Systems, where the user receives suggestions based on his/her past liked items. In this work, we consider the use of neural embeddings weighted using Smooth-Inverse Frequency (SIF) to detect semantic similarities in textual descriptions found in a large graph database covering Italian cultural Points of Interests (POIs). Of all detected similar pairs on a national scale, 19% are composed by POIs that do not belong to the same ontological category, highlighting the potential neural embeddings have to match POIs beyond the categories they have been assigned to. However, since text descriptions also contain references to the places where POIs are found, similarities can be detected among POIs sharing the same location, especially in the case of low-frequency geographical terms. While this may be desirable, in some cases, it may harm location-aware applications, as POIs positions are already known. By comparing city names occurrence probabilities both in the full text corpus and in location-constrained sub-corpora, we observed probability shifts, on average, of 232%. This suggests that, for the specific case of location-aware services, SIF-weighted neural embeddings should use location-constrained sub-corpora for term occurrence probability computation in order to efficiently remove uninteresting information.
On the Impact of Location-related Terms in Neural Embeddings for Content Similarity Measures in Cultural Heritage Recommender Systems / Origlia, Antonio; Di Martino, Sergio. - 13238:(2022), pp. 108-120. (Intervento presentato al convegno Web and Wireless Geographical Information Systems. W2GIS 2022 tenutosi a Kostanz, DE nel 27-29 April, 2022) [10.1007/978-3-031-06245-2_10].
On the Impact of Location-related Terms in Neural Embeddings for Content Similarity Measures in Cultural Heritage Recommender Systems
Origlia, Antonio;Di Martino, Sergio
2022
Abstract
Analysing text to detect semantic similarities is a recent breakthrough of Natural Language Processing that brought many novel applications in different fields. A domain which could greatly benefit of this innovation is the one regarding Location-based and/or Touristic Recommender Systems, where the user receives suggestions based on his/her past liked items. In this work, we consider the use of neural embeddings weighted using Smooth-Inverse Frequency (SIF) to detect semantic similarities in textual descriptions found in a large graph database covering Italian cultural Points of Interests (POIs). Of all detected similar pairs on a national scale, 19% are composed by POIs that do not belong to the same ontological category, highlighting the potential neural embeddings have to match POIs beyond the categories they have been assigned to. However, since text descriptions also contain references to the places where POIs are found, similarities can be detected among POIs sharing the same location, especially in the case of low-frequency geographical terms. While this may be desirable, in some cases, it may harm location-aware applications, as POIs positions are already known. By comparing city names occurrence probabilities both in the full text corpus and in location-constrained sub-corpora, we observed probability shifts, on average, of 232%. This suggests that, for the specific case of location-aware services, SIF-weighted neural embeddings should use location-constrained sub-corpora for term occurrence probability computation in order to efficiently remove uninteresting information.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.