Argument Quality Assessment with Large Language Models: A Pairwise Bradley-Terry Approach 文章

ArXiv CS.CL2026-05-28NEWSen作者: Nicol\'as Benjam\'in Ocampo, Agnes Paullate Nyiranziza, Davide Ceolin

详细信息

来源站点: ArXiv CS.CL
作者: Nicol\'as Benjam\'in Ocampo, Agnes Paullate Nyiranziza, Davide Ceolin
文章类型: NEWS
语言: en
发布日期: 2026-05-28

摘要

arXiv:2605.28313v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in tasks related to reasoning and judgment. However, assessing the quality of arguments requires a rigorous evaluation. We investigate the extent to which LLMs can effectively perform this task. We tested 12 open-weight LLMs of different sizes and families under zero-shot, few-shot, and chain-of-thought to approximate expert pairwise comparisons of argument quality across three dimensions-logical, rhetorical, and dialectic-and used these comparisons in a Bradley-Terry model to infer latent strength scores and derive a ranking of arguments. Our insights show that LLMs have promising but moderate correlation with human expert judgments, with Llama-70B obtaining the strongest alignment, reaching moderate Cohen's $\kappa$ = 0.493 and moderate correlations with Bradley-Terry scores derived from these annotations (Kendall, Pearson, and Spearman: 0.327-0.477).

Argument Quality Assessment with Large Language Models: A Pairwise Bradley-Terry Approach 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (2)