Adaptive Graph Refinement and Label Propagation with LLMs for Cost-Effective Entity Resolution 文章

ArXiv CS.CL2026-05-26NEWSen作者: Hongtao Wang, Renchi Yang, Haoran Zheng, Xiangyu Ke

摘要

arXiv:2605.25814v1 Announce Type: new Abstract: Dirty entity resolution (ER), which identifies records referring to the same real-world entity from a single, messy dataset, is a fundamental task in data management and mining. However, the dominant blocking-matching-clustering paradigm for ER suffers from critical flaws. Its cascaded, decoupled workflow essentially produces a static, sparse graph plagued by missing edges (due to blocking failures) and noisy links (due to matching errors), causing error propagation and yielding suboptimal clusters, particularly when rigid transitivity is imposed in the clustering. We contend that matching and clustering are fundamentally synergistic, both optimizing for the construction of an ideal entity graph. Building upon this insight, we propose Alper, a unified framework that integrates these steps into an iterative probabilistic label propagation process over a global, evolving graph.