A Survey on Similarity Measures in Text Mining 论文

2016Machine Learning and Applications An International Journal引用 364

Data Mining Algorithms and ApplicationsData Management and AlgorithmsText and Document Classification Technologies

Data Management and Algorithms Text and Document Classification Technologies Data Mining Algorithms and Applications

作者

摘要

The Volume of text resources have been increasing in digital libraries and internet. Organizing these text documents has become a practical need. For organizing great number of objects into small or minimum number of coherent groups automatically, Clustering technique is used. These documents are widely used for information retrieval and Natural Language processing tasks. Different Clustering algorithms require a metric for quantifying how dissimilar two given documents are. This difference is often measured by similarity measure such as Euclidean distance, Cosine similarity etc. The similarity measure process in text mining can be used to identify the suitable clustering algorithm for a specific problem. This survey discusses the existing works on text similarity by partitioning them into three significant approaches; String-based, Knowledge based and Corpus-based similarities.

作者查看全部 (2)

K. Kavitha

Vijaymeena M.K

A Survey on Similarity Measures in Text Mining 论文

摘要

作者查看全部 (2)

相关技术查看全部 (3)

相关事件

相关文章