A fully automated object extraction system for the World Wide Web 论文

2002引用 217

Web Data Mining and AnalysisAlgorithms and Data CompressionAdvanced Database Systems and Queries

企业软件 Algorithms and Data Compression Advanced Database Systems and Queries Web Data Mining and Analysis

作者

摘要

This paper presents a fully automated object extraction system Omini. A distinct feature of Omini is the suite of algorithms and the automatically learned information extraction rules for discovering and extracting objects from dynamic Web pages or static Web pages that contain multiple object instances. We evaluated the system using more than 2,000 Web pages over 40 sites. It achieves 100% precision (returns only correct objects) and excellent recall (between 99% and 98%, with very few significant objects left out). The object boundary identification algorithms are fast, about 0.1 second per page with a simple optimization.

作者查看全部 (3)

Ling Liu

Calton Pu

David Buttler

A fully automated object extraction system for the World Wide Web 论文

摘要

作者查看全部 (3)

相关技术查看全部 (1)

相关事件

相关文章