CorDis is a large, XML, TEI-conformant, POS-tagged, multimodal, multigenre corpus representing a significant portion of the political and media discourse on the 2003 Iraqi conflict. It was generated from different sub-corpora which had been assembled by various research groups, ranging from official transcripts of Parliamentary sessions, both in the US and the UK, to the transcripts of the Hutton Inquiry, from American and British newspaper coverage of the conflict to White House press briefings and to transcriptions of American and British TV news programmes. The heterogeneity of the data, the specificity of the genres and the diverse discourse analytical purposes of different groups had led to a wide range of coding strategies being employed to make textual and meta-textual information retrievable.The main purpose of this paper is to show the process of harmonisation and integration whereby a loose collection of texts has become a stable architecture. The TEI proved a valid instrument to achieve standardisation of mark-up. The guidelines provide for a hierarchical organisation which gives the corpus a sound structure favouring replicability and enhancing the reliability of research. In discussing some examples of the problems encountered in the annotation, we will deal with issues like consistency and re-usability, and will examine the constraints imposed on data handling by specific research objectives. Examples include the choice to code the same speakers in different ways depending on the various (institutional) roles they may assume throughout the corpus, the distinction between quotations of spoken or written discourse and quotations read aloud in the course of a spoken text, and the segmentation of portions of news according to participants interaction and use of camera/voiceover.

Xaira and the marking-up of the corpus / Venuti, Marco; L., Cirillo; A., Marchi. - (2007). (Intervento presentato al convegno Cordis Colloquium tenutosi a Università di Siena, Pontignano, Siena nel 26-28 gennaio 2007).

Xaira and the marking-up of the corpus

VENUTI, MARCO;
2007

Abstract

CorDis is a large, XML, TEI-conformant, POS-tagged, multimodal, multigenre corpus representing a significant portion of the political and media discourse on the 2003 Iraqi conflict. It was generated from different sub-corpora which had been assembled by various research groups, ranging from official transcripts of Parliamentary sessions, both in the US and the UK, to the transcripts of the Hutton Inquiry, from American and British newspaper coverage of the conflict to White House press briefings and to transcriptions of American and British TV news programmes. The heterogeneity of the data, the specificity of the genres and the diverse discourse analytical purposes of different groups had led to a wide range of coding strategies being employed to make textual and meta-textual information retrievable.The main purpose of this paper is to show the process of harmonisation and integration whereby a loose collection of texts has become a stable architecture. The TEI proved a valid instrument to achieve standardisation of mark-up. The guidelines provide for a hierarchical organisation which gives the corpus a sound structure favouring replicability and enhancing the reliability of research. In discussing some examples of the problems encountered in the annotation, we will deal with issues like consistency and re-usability, and will examine the constraints imposed on data handling by specific research objectives. Examples include the choice to code the same speakers in different ways depending on the various (institutional) roles they may assume throughout the corpus, the distinction between quotations of spoken or written discourse and quotations read aloud in the course of a spoken text, and the segmentation of portions of news according to participants interaction and use of camera/voiceover.
2007
Xaira and the marking-up of the corpus / Venuti, Marco; L., Cirillo; A., Marchi. - (2007). (Intervento presentato al convegno Cordis Colloquium tenutosi a Università di Siena, Pontignano, Siena nel 26-28 gennaio 2007).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/319650
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact