audio 产品

来源: githubOPEN_SOURCE开源PythonBSD-2-Clause发布于 2017-05-05

Data manipulation and transformation for audio signal processing, powered by PyTorch

2872

Stars

772

Forks

技术栈

替代方案

audio · 相关事件

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

Hierarchical Semantic-Constrained Heterogeneous Graph for Audio-Visual Event Localization

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

FIGMA: Towards FIne-Grained Music retrievAl

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

Multilingual Multi-Speaker Unit Vocoders: A Systematic Analysis of Discrete Speech Representations

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

HybridCodec: Fast Dual-Stream, Semantically Enhanced Neural Audio Codec

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

SpectCount: Spectrotemporal Counting via Synthetic Signals Improves Large Audio Language Models

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

dots.tts Technical Report

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

KIT's Submission to Cross-Lingual Voice Cloning in IWSLT 2026

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

MMAE: A Massive Multitask Audio Editing Benchmark

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

DirectAudioEdit: Inversion-Free Text-Guided Audio Editing via Diffusion Prediction Contrast

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

AIDEN: Design and Pilot Study of an AI Assistant for the Visually Impaired

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

Audio-Visual World Models: Grounding Multisensory Imagination for Embodied Agents

2026-06-08PRODUCT_LAUNCH影响: MEDIUM

Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

UniVoice: A Unified Model for Speech and Singing Voice Generation

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

F3-Tokenizer: Taming Audio Autoencoder Latents for Understanding and Generation

2026-06-06PRODUCT_LAUNCH影响: MEDIUM

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models

2026-06-05BREAKTHROUGH影响: HIGH

Forgive or forget: Understanding the context of hate in audio retrieval systems

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

ProSarc: Prosody-Aware Sarcasm Recognition Framework via Temporal Prosodic Incongruity

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

2026-06-05BREAKTHROUGH影响: HIGH

PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

MAviS: A Multimodal Conversational Assistant For Avian Species

2026-06-05PRODUCT_LAUNCH影响: MEDIUM

SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

Building The Ph(ysical)AI Layer Of Machine Intelligence

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities

2026-06-04OPEN_SOURCE影响: MEDIUM

DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities

2026-06-04BREAKTHROUGH影响: HIGH

Audio Interaction Model

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

VGGSounder: Audio-Visual Evaluations for Foundation Models

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding

2026-06-04BREAKTHROUGH影响: HIGH

Beyond Text Following: Repairable Arbitration Reversals in Audio-Language Models

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

AUDDT: A Unified Benchmark Toolkit for Audio and Speech Deepfake Detectors

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

AUDDT: A Unified Benchmark Toolkit for Audio and Speech Deepfake Detectors

2026-06-04OPEN_SOURCE影响: MEDIUM

Drift-Augmented Scoring: Text-Derived Noise Robustness for Zero-Shot Audio-Language Classification

2026-06-04PRODUCT_LAUNCH影响: MEDIUM

Cosmos 3: Omnimodal World Models for Physical AI

2026-06-03OPEN_SOURCE影响: MEDIUM

AVTrack: Audio-Visual Tracking in Human-centric Complex Scenes

2026-06-03PRODUCT_LAUNCH影响: MEDIUM

Cosmos 3: Omnimodal World Models for Physical AI

2026-06-03PRODUCT_LAUNCH影响: MEDIUM

Cosmos 3: Omnimodal World Models for Physical AI

2026-06-03BREAKTHROUGH影响: HIGH

JAVEDIT: Joint Audio-Visual Instruction-Guided Video Editing with Agentic Data Curation

2026-06-03PRODUCT_LAUNCH影响: MEDIUM

Mamba-Enhanced Implicit Motion Learning for Audio-Driven Portrait Animation

2026-06-03PRODUCT_LAUNCH影响: MEDIUM