Latent Reasoning in TRMs is Secretly a Policy Improvement Operator 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Latent Reasoning in TRMs is Secretly a Policy Improvement Operator arXiv:2511.16886v5 Announce Type: replace Abstract: Recently, small models with latent recursion have obtained promising results on complex reasoning tasks. These results are typically explained by the theory that such recursion increases a networks depth, allowing it to compactly emulate the capacity of larger models. However, the performance of recursively added layers remains behind the capabilities of one pass models with th