VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents arXiv:2605.30256v1 Announce Type: new Abstract: Natural human conversation is full-duplex and audio-visual: people simultaneously speak and listen while continuously interpreting and producing nonverbal cues, such as nods, smiles, and gestures. To support successful human-agent interaction, agents must model full-duplex audiovisual conversation; however, existing full-duplex benchmarks evaluate only speech. In