Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback arXiv:2605.28010v1 Announce Type: new Abstract: Self-evolving large language models (LLMs) learn by generating their own training tasks and solutions, reducing reliance on human-curated supervision. However, in many reasoning domains, the model must also validate generated tasks and judge generated answers to obtain training signals. This creates a training-signal challenge: erroneous self-judgments become erroneous gradient