Cluster-based language models for distributed retrieval 论文

1999引用 296

Information Retrieval and Search BehaviorTopic ModelingWeb Data Mining and Analysis

Topic Modeling Web Data Mining and Analysis Information Retrieval and Search Behavior

作者

摘要

Effective retrieval in a distributed environment is an important but difficult problem. Lack of effectiveness appears to have three causes. First, collection selection based on word histograms is not appropriate for heterogeneous collections. Second, relevant documents are scattered over many collections and searching a few collections misses many relevant documents. Third, most existing collection selection metrics lack sound theoretical justifications and hence may not be well tuned to the problem. We propose a new approach to distributed retrieval based on document clustering and language modeling. Document clustering is used to organize collections around topics. Language modeling is used to properly represent topics and effectively select the right topics for a query. Based on these ideas, three methods are proposed to suit different environments. We show that all three methods improve effectiveness of distributed retrieval. 1 Introduction Information has become highly distribut...

作者查看全部 (2)

W. Bruce Croft

Jinxi Xu

Cluster-based language models for distributed retrieval 论文

摘要

作者查看全部 (2)

相关技术查看全部 (2)

相关事件

相关文章