Publications

Advanced search

Abstract

Germán Sanchis-Trilles, Joan-Andreu Sánchez. Vocabulary Extension via POS Information for SMT. Mixing Approaches to Machine Translation, 2008.

One of the weaknesses of the socalled phrase based translation models is that they carry out a blind extraction of the phrase translation table, i.e., they do not take into account the possible linguistic restrictions that each language introduces because of its own syntax. On the other hand, Part of Speech (POS)tagging is a problem that, nowadays, presents a pretty mature state of the art, obtaining error rates of almost 2%. Because of this, the use of automatically POS-tagged corpora in Statistical Machine Translation (SMT) with the purpose of incorporating syntactical knowledge and enhancing the results obtained by state of the art SMT systems seems quite natural. In this work, we present results obtained on the EuroParl corpus by creating an extended vocabulary composed of the regular words and their POS tags concatenated to them.