Loading...
Loading...

Go to the content (press return)

A large Spanish-Catalan parallel corpus release for machine translation

Author
Costa-jussà, Marta R.; Fonollosa, José A. R.; Mariño, J.B.; Poch, M.; Farrus, M.
Type of activity
Journal article
Journal
Computing and informatics
Date of publication
2014-01-01
Volume
33
Number
4
First page
907
Last page
920
URL
http://cai.type.sk/content/2014/4/a-large-spanish-catalan-parallel-corpus-release-for-machine-translation/ Open in new window
Abstract
We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7.5 M parallel sentences (around 180 M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catalan corpus is partially available via ELDA (Evaluations and Language Resources Distribution Agency) in ca...
Keywords
Catalan-Spanish parallel corpus, machine translation
Group of research
IDEAI-UPC - Intelligent Data Science and Artificial Intelligence Research Center
TALP - Centre for Language and Speech Technologies and Applications
VEU - Speech Processing Group

Participants