Do Agents Need Semantic Metadata? A Comparative Study in Agentic Data Retrieval 文章

ArXiv CS.AI2026-05-28NEWSen作者: Shiyu Chen, Tarfah Alrashed, Alon Halevy, Natasha Noy

摘要

arXiv:2605.28787v1 Announce Type: cross Abstract: In the era of autonomous agents, machine-actionable data is critical for data-driven workflows. For more than a decade, semantic metadata like schema.org has anchored the FAIR principles (Findable, Accessible, Interoperable, and Reusable) for machine-actionable data and enabled discovery tools like Google Dataset Search. However, the rise of Large Language Models (LLMs) capable of navigating the unstructured web raises a fundamental question: Is semantic metadata still necessary for agentic data discovery, or can agents reliably retrieve actionable data directly from the web? We present a comparative analysis of agentic data retrieval across two distinct environments: a Baseline Agent searching billions of open-web documents, and a Semantic Agent leveraging a corpus of 90 million datasets using schema.org.