VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents 事件
PRODUCT_LAUNCH2026-05-29影响: MEDIUM
VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents arXiv:2605.30256v1 Announce Type: new Abstract: Natural human conversation is full-duplex and audio-visual: people simultaneously speak and listen while continuously interpreting and producing nonverbal cues, such as nods, smiles, and gestures. To support successful human-agent interaction, agents must model full-duplex audiovisual conversation; however, existing full-duplex benchmarks evaluate only speech. In
VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents · 相关公司
W
World LabsRESEARCH_INSTITUTE
S
SpanNONPROFIT
A
ACTIONNONPROFIT
I
InterActionNONPROFIT
F
FrameworkCOMPANY
A
ACTNONPROFIT
R
RespectNONPROFIT
R
RatioRESEARCH_INSTITUTE
C
clipsCOMPANY