In the paper, the problems of building a corpus of a low-density variety are considered in the light of two projects - DiWaC and ArchiWals - built to preserve the linguistic and cultural heritage of the Walser German communities of Piedmont and Aosta Valley. In the paper it is argued that similar problems affect the task of working on spoken and written data of low-density varieties. On the one hand low-density varieties are defined by the absence or scarcity of ready-to-use language resources for automatic processing. On the other hand, written and spoken data of low-density varieties are both characterised by a high degree of granularity at different levels. The solutions proposed for DiWaC and ArchiWals are an attempt to conjugate computability and granularity by stratifying the information retrieved in the original texts constituting the corpora.
Corpora e varietà minoritarie: le isole walser in Italia / Angster, Marco; Cioffi, Raffaele; Bellante, Marco; Gaeta, Livio. - In: RID, RIVISTA ITALIANA DI DIALETTOLOGIA. - ISSN 1122-6331. - 44:(2020), pp. 107-125.
Corpora e varietà minoritarie: le isole walser in Italia
Raffaele Cioffi;Livio gaeta
2020
Abstract
In the paper, the problems of building a corpus of a low-density variety are considered in the light of two projects - DiWaC and ArchiWals - built to preserve the linguistic and cultural heritage of the Walser German communities of Piedmont and Aosta Valley. In the paper it is argued that similar problems affect the task of working on spoken and written data of low-density varieties. On the one hand low-density varieties are defined by the absence or scarcity of ready-to-use language resources for automatic processing. On the other hand, written and spoken data of low-density varieties are both characterised by a high degree of granularity at different levels. The solutions proposed for DiWaC and ArchiWals are an attempt to conjugate computability and granularity by stratifying the information retrieved in the original texts constituting the corpora.File | Dimensione | Formato | |
---|---|---|---|
2020_RID.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Non specificato
Dimensione
300.23 kB
Formato
Adobe PDF
|
300.23 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.