The present paper reports on the advantages of learning inferences and understanding strategies from the interactive structure of a corpus. First of all, we introduce the SUGAR corpus for the cooking domain, describing its peculiar collection and annotation procedures. After this first overview, we show how information included within the corpus can be used to enhance the action interpretation in dialogue systems. This can be the case of linguistic elements or related lexical units which can be acquired from a linked database or from rephrasing strategies within the corpus itself. In all the AI-based approaches depending on a training process using large and representative corpora, the probability to correctly predict the creativity a speaker can perform in using language is lower than expected. Trying to capture most of the possible words and expressions a speaker could use is extremely necessary, but even an empirical, finite collection of cases could not be enough. For this reason, the use of our corpus, possibly in combination with online training, appears as an appealing solution.

Learning Between the Lines: Interactive Learning Modules Within Corpus Design / DI MARO, Maria; Origlia, Antonio; Cutugno, Francesco. - (2021), pp. 321-329.

Learning Between the Lines: Interactive Learning Modules Within Corpus Design

Di Maro Maria;Origlia Antonio;Cutugno Francesco
2021

Abstract

The present paper reports on the advantages of learning inferences and understanding strategies from the interactive structure of a corpus. First of all, we introduce the SUGAR corpus for the cooking domain, describing its peculiar collection and annotation procedures. After this first overview, we show how information included within the corpus can be used to enhance the action interpretation in dialogue systems. This can be the case of linguistic elements or related lexical units which can be acquired from a linked database or from rephrasing strategies within the corpus itself. In all the AI-based approaches depending on a training process using large and representative corpora, the probability to correctly predict the creativity a speaker can perform in using language is lower than expected. Trying to capture most of the possible words and expressions a speaker could use is extremely necessary, but even an empirical, finite collection of cases could not be enough. For this reason, the use of our corpus, possibly in combination with online training, appears as an appealing solution.
2021
9811593221
Learning Between the Lines: Interactive Learning Modules Within Corpus Design / DI MARO, Maria; Origlia, Antonio; Cutugno, Francesco. - (2021), pp. 321-329.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/963029
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact