Advanced search


The Integration of Phonetic Knowledge in Speech Technology. Moisés Pastor-i-Gadea, Francisco Casacuberta. 2004.

The great variability of word pronunciations in spontaneous speech is one of the reasons for the low performance of the present speech recognition systems. The generation of dictionaries taking into account this variability may increase the robustness of such systems. A word pronunciation is a possible phoneme-like sequence that can appear in a real utterance, and represents a possible acoustic production of the word. In this paper, word pronunciations are modeled using stochastic finite-state automata. The use of such models allows the application of grammatical inference methods and an easy integration with the other knowledge sources. The training samples are obtained from the alignment between the phoneme-like decoding of each training utterance and the corresponding canonical transcription. The models proposed in this work were applied in a translation-oriented speech task. The improvements achieved by these new models were in the range between 2.7 to 0.6 points depending on the language model used.