Agglomerative Information Bottleneck 论文

1999引用 392

Bayesian Methods and Mixture ModelsAlgorithms and Data CompressionComplex Network Analysis Techniques

Algorithms and Data Compression Bayesian Methods and Mixture Models Complex Network Analysis Techniques

作者

摘要

We introduce a novel distributional clustering algorithm that explicitly maximizes the mutual information per cluster between the data and given categories. This algorithm can be considered as a bottom up hard version of the recently introduced &quot;Information Bottleneck Method&quot;. We relate the mutual information between clusters and categories to the Bayesian classification error, which provides another motivation for using the obtained clusters as features. The algorithm is compared with the top-down soft version of the information bottleneck method and a relationship between the hard and soft results is established. We demonstrate the algorithm on the 20 Newsgroups data set. For a subset of two news-groups we achieve compression by 3 orders of magnitudes loosing only 10% of the original mutual information. 1 Introduction The problem of self-organization of the members of a set X based on the similarity of the conditional distributions of the members of another set, Y , fp(yjx)g, was fir...

作者查看全部 (2)

Naftali Tishby

Noam Slonim

Agglomerative Information Bottleneck 论文

摘要

作者查看全部 (2)

相关技术查看全部 (3)

相关事件

相关文章