Argument Quality Assessment with Large Language Models: A Pairwise Bradley-Terry Approach 文章

ArXiv CS.CL2026-05-28NEWSen作者: Nicol\'as Benjam\'in Ocampo, Agnes Paullate Nyiranziza, Davide Ceolin

详细信息

来源站点
ArXiv CS.CL
作者
Nicol\'as Benjam\'in Ocampo, Agnes Paullate Nyiranziza, Davide Ceolin
文章类型
NEWS
语言
en
发布日期
2026-05-28

摘要

arXiv:2605.28313v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in tasks related to reasoning and judgment. However, assessing the quality of arguments requires a rigorous evaluation. We investigate the extent to which LLMs can effectively perform this task. We tested 12 open-weight LLMs of different sizes and families under zero-shot, few-shot, and chain-of-thought to approximate expert pairwise comparisons of argument quality across three dimensions-logical, rhetorical, and dialectic-and used these comparisons in a Bradley-Terry model to infer latent strength scores and derive a ranking of arguments. Our insights show that LLMs have promising but moderate correlation with human expert judgments, with Llama-70B obtaining the strongest alignment, reaching moderate Cohen's $\kappa$ = 0.493 and moderate correlations with Bradley-Terry scores derived from these annotations (Kendall, Pearson, and Spearman: 0.327-0.477).

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据