On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance 文章

ArXiv CS.CL2026-06-02NEWSen作者: Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez

摘要

arXiv:2606.00467v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-internalized priors interact with user-provided instructions. We investigate three dimensions of this interaction: (1) how an LLM's familiarity with data and task definitions affects performance, (2) the extent to which additional information in prompts can correct zero-shot errors ("decision stickiness"), and (3) model susceptibility to misaligned task definitions. Through experiments on toxicity detection across diverse datasets (spanning social media, gaming, news, and forums) using both dense and mixture-of-experts models, we find that nearly two-thirds of zero-shot errors are resistant to correction, with an overall rescue rate (fraction of initial errors corrected by prompting) of only 34.8%. High-confidence errors prove especially resistant to correction.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据