Fast webpage classification using URL features 论文

2005引用 224

Web Data Mining and AnalysisText and Document Classification TechnologiesSpam and Phishing Detection

Text and Document Classification Technologies Web Data Mining and Analysis Spam and Phishing Detection

作者

摘要

We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is faster than typical web page classification, as the pages do not have to be fetched and analyzed. Our approach segments the URL into meaningful chunks and adds component, sequential and orthographic features to model salient patterns. The resulting features are used in supervised maximum entropy modeling. We analyze our approach's effectiveness on two standardized domains. Our results show that in certain scenarios, URL-based methods approach the performance of current state-of-the-art full-text and link-based methods.

作者查看全部 (2)

Hoang Oanh Nguyen Thi

Min‐Yen Kan

Fast webpage classification using URL features 论文

摘要

作者查看全部 (2)

相关技术查看全部 (2)

相关事件

相关文章