Advanced search


Francisco Casacuberta, Enrique Vidal, Juan M. Vilar. Architectures for speech-to-speech translation using finite-state models. Proceedings of the Workshop on Speech-to-Speech Translation: Algorithms and Systemsa, 2002. pp. 39-44. ACL.

Speech-to-speech translation can be approached in a similar way to automatic speech recognition. Hidden Markov models (HMMs) are used as acoustic models of source-language words and the source-target language mapping is modeled by an adequate finite-state transducer. The first of these two components, together with a suitable source language model, can be used for recognizing input utterances which can in turn be translated into target-language sentences using the second component. Such a conventional setting is what we call ``serial architecture''. On the other hand, a more interesting ``integrated architecture'' can be also adopted where the HMMs are integrated into the finite-state transducer. In this case, the translation process is performed by searching for an optimal path of states in the integrated network. The output of this search process is the target word sequence associated to the optimal path. In both architectures, HMMs can be trained from a source-language speech corpus, and the translation model can be learned automatically from a parallel text training corpus. The experiments presented here correspond to speech-input translations from Spanish to English and from Italian to English, in applications involving the interaction (by telephone) of a customer with the front-desk of a hotel.