Clustering Sequences with Hidden Markov Models 论文

1996引用 365
Bayesian Methods and Mixture ModelsAlgorithms and Data CompressionAdvanced Clustering Algorithms Research

摘要

This paper discusses a probabilistic model-based approach to clustering sequences, using hidden Markov models (HMMs). The problem can be framed as a generalization of the standard mixture model approach to clustering in feature space. Two primary issues are addressed. First, a novel parameter initialization procedure is proposed, and second, the more difficult problem of determining the number of clusters K, from the data, is investigated. Experimental results indicate that the proposed techniques are useful for revealing hidden cluster structure in data sets of sequences. 1 Introduction Consider a data set D consisting of N sequences, D = fS 1 ; . . . ; SN g. S i = (x i 1 ; . . . x i L i ) is a sequence of length L i composed of potentially multivariate feature vectors x. The problem addressed in this paper is the discovery from data of a natural grouping of the sequences into K clusters. This is analagous to clustering in multivariate feature space which is normally handled by m...