audio · 相关事件
相关事件
IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
Hierarchical Semantic-Constrained Heterogeneous Graph for Audio-Visual Event Localization
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
FIGMA: Towards FIne-Grained Music retrievAl
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
Multilingual Multi-Speaker Unit Vocoders: A Systematic Analysis of Discrete Speech Representations
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
HybridCodec: Fast Dual-Stream, Semantically Enhanced Neural Audio Codec
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
SpectCount: Spectrotemporal Counting via Synthetic Signals Improves Large Audio Language Models
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
dots.tts Technical Report
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
KIT's Submission to Cross-Lingual Voice Cloning in IWSLT 2026
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
MMAE: A Massive Multitask Audio Editing Benchmark
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
DirectAudioEdit: Inversion-Free Text-Guided Audio Editing via Diffusion Prediction Contrast
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
AIDEN: Design and Pilot Study of an AI Assistant for the Visually Impaired
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
Audio-Visual World Models: Grounding Multisensory Imagination for Embodied Agents
2026-06-08PRODUCT_LAUNCH影响: MEDIUM
Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
UniVoice: A Unified Model for Speech and Singing Voice Generation
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
F3-Tokenizer: Taming Audio Autoencoder Latents for Understanding and Generation
2026-06-06PRODUCT_LAUNCH影响: MEDIUM
Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models
2026-06-05BREAKTHROUGH影响: HIGH
Forgive or forget: Understanding the context of hate in audio retrieval systems
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
ProSarc: Prosody-Aware Sarcasm Recognition Framework via Temporal Prosodic Incongruity
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding
2026-06-05BREAKTHROUGH影响: HIGH
PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
MAviS: A Multimodal Conversational Assistant For Avian Species
2026-06-05PRODUCT_LAUNCH影响: MEDIUM
SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Building The Ph(ysical)AI Layer Of Machine Intelligence
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities
2026-06-04OPEN_SOURCE影响: MEDIUM
DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities
2026-06-04BREAKTHROUGH影响: HIGH
Audio Interaction Model
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
VGGSounder: Audio-Visual Evaluations for Foundation Models
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding
2026-06-04BREAKTHROUGH影响: HIGH
Beyond Text Following: Repairable Arbitration Reversals in Audio-Language Models
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
AUDDT: A Unified Benchmark Toolkit for Audio and Speech Deepfake Detectors
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
AUDDT: A Unified Benchmark Toolkit for Audio and Speech Deepfake Detectors
2026-06-04OPEN_SOURCE影响: MEDIUM
Drift-Augmented Scoring: Text-Derived Noise Robustness for Zero-Shot Audio-Language Classification
2026-06-04PRODUCT_LAUNCH影响: MEDIUM
Cosmos 3: Omnimodal World Models for Physical AI
2026-06-03OPEN_SOURCE影响: MEDIUM
AVTrack: Audio-Visual Tracking in Human-centric Complex Scenes
2026-06-03PRODUCT_LAUNCH影响: MEDIUM
Cosmos 3: Omnimodal World Models for Physical AI
2026-06-03PRODUCT_LAUNCH影响: MEDIUM
Cosmos 3: Omnimodal World Models for Physical AI
2026-06-03BREAKTHROUGH影响: HIGH
JAVEDIT: Joint Audio-Visual Instruction-Guided Video Editing with Agentic Data Curation
2026-06-03PRODUCT_LAUNCH影响: MEDIUM
Mamba-Enhanced Implicit Motion Learning for Audio-Driven Portrait Animation
2026-06-03PRODUCT_LAUNCH影响: MEDIUM