The present paper reports on the advantages of learning inferences and understanding strategies from the interactive structure of a corpus. First of all, we introduce the SUGAR corpus for the cooking domain, describing its peculiar collection and annotation procedures. After this first overview, we show how information included within the corpus can be used to enhance the action interpretation in dialogue systems. This can be the case of linguistic elements or related lexical units which can be acquired from a linked database or from rephrasing strategies within the corpus itself. In all the AI-based approaches depending on a training process using large and representative corpora, the probability to correctly predict the creativity a speaker can perform in using language is lower than expected. Trying to capture most of the possible words and expressions a speaker could use is extremely necessary, but even an empirical, finite collection of cases could not be enough. For this reason, the use of our corpus, possibly in combination with online training, appears as an appealing solution.
Learning Between the Lines: Interactive Learning Modules Within Corpus Design / DI MARO, Maria; Origlia, Antonio; Cutugno, Francesco. - (2021), pp. 321-329.
Learning Between the Lines: Interactive Learning Modules Within Corpus Design
Di Maro Maria;Origlia Antonio;Cutugno Francesco
2021
Abstract
The present paper reports on the advantages of learning inferences and understanding strategies from the interactive structure of a corpus. First of all, we introduce the SUGAR corpus for the cooking domain, describing its peculiar collection and annotation procedures. After this first overview, we show how information included within the corpus can be used to enhance the action interpretation in dialogue systems. This can be the case of linguistic elements or related lexical units which can be acquired from a linked database or from rephrasing strategies within the corpus itself. In all the AI-based approaches depending on a training process using large and representative corpora, the probability to correctly predict the creativity a speaker can perform in using language is lower than expected. Trying to capture most of the possible words and expressions a speaker could use is extremely necessary, but even an empirical, finite collection of cases could not be enough. For this reason, the use of our corpus, possibly in combination with online training, appears as an appealing solution.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.