Sequence mining in categorical domains 论文

2000引用 250
Data Mining Algorithms and ApplicationsRough Sets and Fuzzy LogicData Management and Algorithms

摘要

We present cSPADE, an efficient algorithm for mining frequent sequences considering a variety of syntactic constraints. These take the form of length or width limitations on the sequences, minimum or maximum gap constraints on consecutive sequence elements, applying a time window on allowable sequences, incorporating item constraints, and finding sequences predictive of one or more classes, even rare ones. Our method is efficient and scalable. Experiments on a number of synthetic and real databases show the utility and performance of considering such constraints on the set of mined sequences.