Evaluating Structural Similarity in XML Documents 论文
2002引用 343
Natural Language Processing TechniquesAlgorithms and Data CompressionSemantic Web and Ontologies
详细信息
- 发表日期
- 2002-01-01
- 发表年份
- 2002
关键词
Natural Language Processing TechniquesAlgorithms and Data CompressionSemantic Web and Ontologies
摘要
XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (XTRACT) has given us a means to (re-)construct a DTD to describe the structure common to a given set of document instances. However, given a collection of documents with unknown DTDs, it may not be appropriate to construct a single DTD to describe every document in the collection. Instead, we would wish to partition the collection into smaller sets of "similar" documents, and then induce a separate DTD for each such set. It is this partitioning problem that we address in this paper.