Using the Web to Obtain Frequencies for Unseen Bigrams 论文
2003Computational Linguistics引用 357顶会
Natural Language Processing TechniquesTopic ModelingSemantic Web and Ontologies
详细信息
- 发表期刊/会议
- Computational Linguistics
- 发表日期
- 2003-09-01
- 发表年份
- 2003
关键词
Natural Language Processing TechniquesTopic ModelingSemantic Web and Ontologies
摘要
This article shows that the Web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the Web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between Web frequencies and corpus frequencies; (b) a reliable correlation between Web frequencies and plausibility judgments; (c) a reliable correlation between Web frequencies and frequencies recreated using class-based smoothing; (d) a good performance of Web frequencies in a pseudo disambiguation task.