Evaluating Structural Similarity in XML Documents 论文

2002引用 343

Natural Language Processing TechniquesAlgorithms and Data CompressionSemantic Web and Ontologies

Natural Language Processing Techniques Algorithms and Data Compression Semantic Web and Ontologies

作者

摘要

XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (XTRACT) has given us a means to (re-)construct a DTD to describe the structure common to a given set of document instances. However, given a collection of documents with unknown DTDs, it may not be appropriate to construct a single DTD to describe every document in the collection. Instead, we would wish to partition the collection into smaller sets of &quot;similar&quot; documents, and then induce a separate DTD for each such set. It is this partitioning problem that we address in this paper.

作者查看全部 (2)

H. V. Jagadish

Andrew Nierman

Evaluating Structural Similarity in XML Documents 论文

详细信息

摘要

作者查看全部 (2)

相关技术查看全部 (2)

相关事件

相关文章