Browse by topic
Type of publication
Parallel corpora segmentation by using anchor words. Proceedings of the of EACL 2003 workshop on EAMT, 2003.A new technique for monotone segmentation of parallel corpora is introduced. This segmentation is based on a set of anchor words defined manually. The parallel segments are computed using a dynamic programming algorithm. To assess the introduced technique, finite-state transducers are inferred from both non-segmented and segmented corpora. Experiments have been carried out with a Spanish-English and an Italian-English translation tasks. This technique has proven useful to help improving the results with respect to those obtained with unsegmented corpora.