Статья

Class-based n -gram models of natural language

Peter F. BrownIBM T.J. Watson research CenterP.V. deSouzaIBM T.J. Watson research CenterRobert L. MercerIBM T.J. Watson research CenterVincent J. Della PietraIBM T.J. Watson research CenterJenifer C. LaiIBM T.J. Watson research Center

1992en

ABI

Аннотация

We address the problem of predicting a word from previous words in a sample of text. In particular, we discuss n-gram models based on classes of words. We also discuss several statistical algorithms for assigning words to classes based on the frequency of their cooccurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics. 1 Introduction In a number of natural language processing tasks, we face the problem of recovering a string of English words after it has been garbled by passage through a noisy channel. To tackle this problem successfully, we must be able to estimate the probability with which any particular string of English words will be presented as input to the noisy channel. In this paper, we discuss a method for making such estimates. We also discuss the related topic of assigning words to classes according to statisti...

Перевод пока недоступен

Идентификаторы

DOI: 10.5555/176313.176316

Цитирования и источники

Цитирований: 3Использованных источников: 0

Показатели — AkademScholar