This paper discusses the role that statistical machine translation (SMT) can play in the development of cross-border EU e-commerce,by highlighting extant obstacles and identifying relevant technologies to overcome them. In this sense, it firstly proposes a typology of e-commerce static and dynamic textual genres and it identifies those that may be more successfully targeted by SMT. The specific challenges concerning the automatic translation of user-generated content are discussed in detail. Secondly, the paper highlights the risk of data sparsity inherent to e-commerce and it explores the state-of-the-art strategies to achieve domain adequacy via adaptation. Thirdly, it proposes a robust workflow for the development of SMT systems adapted to the e-commerce domain by relying on inexpensive methods. Given the scarcity of user-generated language corpora for most language pairs, the paper proposes to obtain monolingual target-language data to train language models and aligned parallel corpora to tune and evaluate MT systems by means of crowdsourcing.
Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities / Meritxell, Fernández-Barrera; Vladimir, Popescu; Antonio, Toral; Gaspari, F; Khalid, Choukri. - (2016), pp. 4550-4556. (Intervento presentato al convegno 10th Language Resources and Evaluation Conference tenutosi a Portorož, Slovenia nel 23-28 May 2016).
Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities
Gaspari F;
2016
Abstract
This paper discusses the role that statistical machine translation (SMT) can play in the development of cross-border EU e-commerce,by highlighting extant obstacles and identifying relevant technologies to overcome them. In this sense, it firstly proposes a typology of e-commerce static and dynamic textual genres and it identifies those that may be more successfully targeted by SMT. The specific challenges concerning the automatic translation of user-generated content are discussed in detail. Secondly, the paper highlights the risk of data sparsity inherent to e-commerce and it explores the state-of-the-art strategies to achieve domain adequacy via adaptation. Thirdly, it proposes a robust workflow for the development of SMT systems adapted to the e-commerce domain by relying on inexpensive methods. Given the scarcity of user-generated language corpora for most language pairs, the paper proposes to obtain monolingual target-language data to train language models and aligned parallel corpora to tune and evaluate MT systems by means of crowdsourcing.File | Dimensione | Formato | |
---|---|---|---|
PDFsam_merge.pdf
non disponibili
Dimensione
328.89 kB
Formato
Adobe PDF
|
328.89 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.