Your Multimodal Speech Model Says I Have a Face for Radio 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Your Multimodal Speech Model Says I Have a Face for Radio arXiv:2605.30472v1 Announce Type: new Abstract: As large neural models have become better at language tasks, researchers are increasingly building multi- and omnimodal models that handle more modalities of data. One example is the expansion of speech recognition models to audio-visual data for noise mitigation and multimodal subtitling. While performance and bias have been studied extensively in the single-modality regime, it is unknown