Advanced search


Martha-Alicia Rocha, Joan-Andreu Sánchez. Machine Translation of the Penn Treebank to Spanish. I Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages, 2009.

In this work we explored the problem of translating the Penn Treebank corpus to Spanish. For this problem, we considered Phrase-based Machine Translation techniques. Given that there not exist parallel training data for this corpus, we used a large out-of-domain training data set, and a small ``hight-quality'' in-domain training data set. We studied simple and effective Domain Adaptation techniques that were used for other applications. We report experiments on a small test set of sentences manually translated from the Penn Treebank corpus.