Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text 文章

ArXiv CS.CL2026-06-04NEWSen作者: Tianyang Zhou, Wenbo Chen, Pierre Jinghong Liang, Leman Akoglu

摘要

arXiv:2605.29076v2 Announce Type: replace Abstract: LLMs have advanced text classification, yet existing paradigms face a trade-off: supervised (label only) fine-tuning is scalable but offers limited reasoning on complex text and lacks broader model transparency, while discrete prompt optimization offers human-readable instructions but struggles with performance and scalability. We introduce eXTC (eXplainable Text Classifier) with three progressive stages: (1) learning a Standard Operating Procedure (SOP, or rulebook) in natural language via a new Structured Prompt Optimization algorithm; (2) SOP-grounded reasoning distillation from a large teacher LLM into a compact LM; and (3) expanding reasoning capabilities beyond the initial SOP via reinforcement learning.