With the growing interest in statistical parsing, special attention has been recently devoted to the problem of comparing different treebanks, with the goal of assessing which languages or which domains are more “difficult” to parse relative to a given model. A common methodology in comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC being always accompanied by a degradation in parsing accuracy.

An Information Theoretic Measure to Evaluate Parsing Difficulty Across Treebanks / Corazza, Anna; Alberto, Lavelli; Giorgio, Satta. - In: ACM TRANSACTIONS ON SPEECH AND LANGUAGE PROCESSING. - ISSN 1550-4875. - 9:4(2013), pp. 1-31. [10.1145/2407736.2407737]

An Information Theoretic Measure to Evaluate Parsing Difficulty Across Treebanks

CORAZZA, ANNA;
2013

Abstract

With the growing interest in statistical parsing, special attention has been recently devoted to the problem of comparing different treebanks, with the goal of assessing which languages or which domains are more “difficult” to parse relative to a given model. A common methodology in comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC being always accompanied by a degradation in parsing accuracy.
2013
An Information Theoretic Measure to Evaluate Parsing Difficulty Across Treebanks / Corazza, Anna; Alberto, Lavelli; Giorgio, Satta. - In: ACM TRANSACTIONS ON SPEECH AND LANGUAGE PROCESSING. - ISSN 1550-4875. - 9:4(2013), pp. 1-31. [10.1145/2407736.2407737]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/509760
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact