Latest in AI

Showing:audioDevelopersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

NVIDIA 推出 Nemotron 3 Nano Omni：支援長文本的多模態智慧模型，專為文件、語音與影片 Agent 設計★ 75
Hugging Face Blog90 days agoRelease
NVIDIA has officially launched a new lightweight multimodal model, "Nemotron 3 Nano Omni." This model is designed to deliver powerful multimodal intelligence…
Google DeepMind 推出全新改進版 Gemini 音訊模型，打造更強大的語音互動體驗★ 85
Google DeepMind Blog227 days agoRelease
Google DeepMind has announced a major upgrade to its Gemini audio models, aimed at delivering a more natural, fluid, and low-latency voice interaction…
Google 發表 Gemma 3n 預覽版：強大、高效且行動優先的端側多模態 AI 模型★ 78
Google DeepMind Blog434 days agoRelease
Google DeepMind has officially released a preview of its new open model "Gemma 3n." This is a cutting-edge open model purpose-built for mobile devices and…
Hugging Face 推出極速 Whisper 語音轉文字 Inference Endpoints 部署方案★ 75
Hugging Face Blog441 days agoNew Tool
Hugging Face recently announced a brand-new, ultra-fast optimized deployment solution for OpenAI's open-source speech recognition model Whisper on its hosted…
評估音訊推理能力：Hugging Face 推出 Big Bench Audio 基準測試★ 75
Hugging Face Blog585 days agoRelease
As multimodal large language models (such as GPT-4o, Gemini, and various open-source audio models) continue to proliferate, AI's ability to process audio has…
使用 🤗 Transformers 微調 W2V2-BERT 以進行低資源語音辨識 (ASR)★ 75
Hugging Face Blog921 days agoTutorial
This technical blog post from Hugging Face provides a detailed walkthrough of how to use the `transformers` library to fine-tune Meta's open-source W2V2-BERT…
在 Replicate 上微調 MusicGen，輕鬆生成任何風格的音樂
Replicate Blog1,019 days agoNew Tool
AI cloud deployment and runtime platform Replicate has announced official support for fine-tuning Meta's open-source music generation model MusicGen. This new…
Hugging Face 音訊資料集完整指南：從載入、預處理到串流處理★ 70
Hugging Face Blog1,321 days agoTutorial
With the rapid growth of voice AI (such as Whisper), efficiently handling audio datasets has become critically important. This guide from the official Hugging…
Hugging Face Datasets 推出全新音訊與電腦視覺文件指南
Hugging Face Blog1,461 days agoRelease
Hugging Face announced new official Audio and Vision documentation guides for its core open-source library `datasets`. As multimodal AI models continue to…
使用 🤗 Transformers 在 Hugging Face 中微調 Wav2Vec2 進行英文語音辨識 (ASR)★ 70
Hugging Face Blog1,964 days agoTutorial
This is a landmark technical tutorial published by the Hugging Face team in 2021, detailing how to fine-tune Meta AI's Wav2Vec2 model using the Hugging Face…