Sandboxed Coding Agents are Competitive Omni-modal Task Solvers 事件

SHUTDOWN2026-06-02影响: LOW

Sandboxed Coding Agents are Competitive Omni-modal Task Solvers arXiv:2606.00579v1 Announce Type: cross Abstract: As multimodal LLMs increasingly target video and audio, it is often assumed that such tasks require native omnimodal models. We show that this is not always the case: coding agents with only text+image access and a sandboxed tool-use interface can match, and in several settings outperform, SOTA native omnimodal models and predefined multimodal agent scaffolds across multiple audio-v