Publications

Advanced search

Abstract

Diego Linares, José-Miguel Benedí, Joan-Andreu Sánchez. A Hybrid Language Model based on Stochastic Context-Free Grammars. ECML/PKDD 2003 Workshop on Learning Context-Free Grammars, 2003. Colin Higuera, Pieter W. Adriaans, Menno M. Zaanen, Jose Oncina (Editors). pp. 41-52.

This paper explores the use of initial Stochastic Context-Free Grammars (SCFG) obtained from a treebank corpus for the learning of SCFG by means of estimation algorithms. A hybrid language model is defined as a combination of a word-based n-gram, which is used to capture the local relations between words, and a category-based SCFG with a word distribution into categories, which is defined to represent the long-term relations between these categories. Experiments on the UPenn Treebank corpus are reported. These experiments have been carried out in terms of the test set perplexity and the word error rate in a speech recognition experiment.