AI-Assisted Systematization for Evaluating GenAI Systems 文章

ArXiv CS.CL2026-05-26NEWSen作者: Dhruv Agarwal, Emily Sheng, Chad Atalla, Jean Garcia-Gathright, Hussein Mozannar, Hannah Washington, Alexandra Chouldechova, Solon Barocas, Hanna Wallach

摘要

arXiv:2605.26001v1 Announce Type: new Abstract: Evaluating generative AI (GenAI) systems is challenging because many targets of evaluation are broad, contested concepts, such as "reasoning," "fairness," or "creativity." When these concepts are left underspecified, it becomes unclear what should be measured or how evaluation results should be interpreted. This problem reflects a missing step: systematization, that is, moving from a broad background concept to an explicit, structured account of the concept in measurable terms. To help address the fact that systematization is cognitively demanding and resource-intensive, we investigate whether AI assistance can support this process. To enable AI-assisted systematization and assess its quality, we introduce a structured representation of a systematized concept, a concept spec, and a validation worksheet.