An estimate of an upper bound for the entropy of English 论文

1992The COCOON platform (University of Paris)引用 354

Algorithms and Data CompressionNatural Language Processing TechniquesCellular Automata and Applications

Natural Language Processing Techniques Algorithms and Data Compression Cellular Automata and Applications

作者

摘要

We present an estimate of an upper bound of 1.75 bits for the entropy of characters in printed English, obtained by constructing a word trigram model and then computing the cross-entropy between this model and a balanced sample of English text. We suggest the well-known and widely available Brown Corpus of printed English as a standard against which to measure progress in language modeling and offer our bound as the first of what we hope will be a series of steadily decreasing bounds.

作者查看全部 (5)

Peter F Brown

Jennifer C. Lai

Stephen A. Della Pietra

Robert L. Mercer

An estimate of an upper bound for the entropy of English 论文

摘要

作者查看全部 (5)

相关技术查看全部 (2)

相关事件

相关文章