Diego Linares, José-Miguel Benedí, Joan-Andreu Sánchez. Earley-based stochastic context-free grammar estimation from bracketed corpora and its use in a hybrid language model. Proceeding of the SEPLN: XIX Congreso de la Sociedad Española para el Procesamiento del Lenguaje, 2003. pp. 183-190.

In this paper, we study the problem of estimating Stochastic Context-Free Grammars (SCFGs) in general format and their use in a hybrid language model. In this work, we propose the estimation of a SCFG by means of a new bracketed version of the Earley algorithm. A hybrid language model is defined as a combination of a word-based n-gram, which is used to capture the local relations between words, and a category-based SCFG with a word distribution in categories, which is defined to represent the long-term relations between these categories. Experiments on the UPenn Treebank corpus are reported. These experiments have been carried out in terms of the test set perplexity and the word error rate in a speech recognition experiment.