Advanced search


José-Miguel Benedí, Joan-Andreu Sánchez. Combinations of n-grams and stochastic context-free grammars for language modeling. International Conference on Computational Linguistic, 2000. pp. 55-61.

This paper describes a hybrid proposal to combine n-grams and Stochastic Context-Free Grammars (SCFGs) for language modeling. A classical n-gram model is used to capture the local relations between words, while a stochastic grammatical model is considered to represent the long-term relations between syntactical structures. In order to define this grammatical model, which will be used on large-vocabulary complex tasks, a category-based SCFG and a probabilistic model of word distribution in the categories have been proposed. Methods for learning these stochastic models for complex tasks are described, and algorithms for computing the word transition probabilities are also presented. Finally, experiments using the Penn Treebank corpus improved by 30% the test set perplexity with regard to the classical n-gram models.