Trait-Aware Policy Optimization for Autoregressive Multi-Trait Essay Scoring 文章

ArXiv CS.CL2026-05-26NEWSen作者: Zhengyang Wang, Sanwoo Lee, Jiaxin Wang, Chenxi Miao, Weikang Li, Yunfang Wu

摘要

arXiv:2605.25731v1 Announce Type: new Abstract: Multi-trait essay scoring aims to provide fine-grained evaluation of writing quality across multiple dimensions. However, how to effectively post-train autoregressive scoring models remains underexplored. In this paper, we propose Trait-Aware Policy Optimization (TAPO), a post-training framework tailored to autoregressive multi-trait scoring. Our method decomposes rewards along both the sample and trait dimensions, combining global scoring consistency, trait-level accuracy, format validity, and inter-trait dependency preservation. In addition, we enhance supervised fine-tuning with enhanced prompts, allowing the model to internalize trait semantics before preference optimization.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据