Latent Reasoning in TRMs is Secretly a Policy Improvement Operator 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Latent Reasoning in TRMs is Secretly a Policy Improvement Operator arXiv:2511.16886v5 Announce Type: replace Abstract: Recently, small models with latent recursion have obtained promising results on complex reasoning tasks. These results are typically explained by the theory that such recursion increases a networks depth, allowing it to compactly emulate the capacity of larger models. However, the performance of recursively added layers remains behind the capabilities of one pass models with th

Latent Reasoning in TRMs is Secretly a Policy Improvement Operator · 相关产品