Latest in AI

Showing:speech-recognitionResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Mistral AI Launches Voxtral: Audio Speech and Understanding Model
Mistral AI News40 days agoRelease
Mistral AI has announced Voxtral, its debut audio-native language model family targeting speech recognition, multilingual transcription, and audio comprehension. Available in two sizes via Mistral's La Plateforme API, it extends the company's portfolio decisively into multimodal AI. The release positions Mistral as a full-stack AI provider capable of handling voice and audio alongside its established text and code capabilities.
Benchmarking Google Eloquent Exposes Major On-Device Dictation Reliability Issues
r/LocalLLaMA top day47 days agoBenchmark
A LocalLLaMA user tried to benchmark Google’s new fully local dictation app, Eloquent, against open ASR models such as Qwen3-ASR and NVIDIA Parakeet V3. The tester reported that roughly half of dictations returned only fragments, even during manual use. When Eloquent produced complete transcripts, its word error rate was competitive, but the missing-output behavior made the app unreliable for evaluation and practical use.
Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech
Hugging Face Blog48 days agoBenchmark
Code-switching—where bilingual speakers blend two languages in a single utterance—is common in markets like Taiwan, Singapore, and India, yet most ASR benchmarks focus on monolingual audio. ServiceNow AI evaluates frontier speech recognition models specifically on this mixed-language scenario. The findings help enterprise teams make informed ASR model choices when deploying voice agents for multilingual customer-facing applications.
Introducing Cohere Transcribe: A New State-of-the-Art in Open-Source Speech Recognition★ 80
Cohere Blog50 days agoRelease
Cohere has announced "Cohere Transcribe," a new state-of-the-art open-source speech recognition model. Designed to deliver highly accurate and efficient speech-to-text capabilities, it represents Cohere's expansion into open-source audio AI. The model aims to challenge existing industry benchmarks like OpenAI's Whisper by offering superior multilingual performance.
How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent
Hugging Face Blog54 days agoTutorial
This Hugging Face Blog post appears to be a practical tutorial for fine-tuning NVIDIA Nemotron 3.5 ASR. Based on the title, it focuses on adapting speech recognition to a target language, specialized domain, or accent. The original text was not provided, so implementation details, datasets, commands, metrics, and hardware requirements cannot be confirmed.
Hugging Face 為 Open ASR 排行榜引入「防刷榜機制」，使用私有測試數據打擊 Benchmaxxer★ 75
Hugging Face Blog83 days agoRelease
Hugging Face has recently made a major update to its popular Open ASR (Automatic Speech Recognition) leaderboard, aimed at combating the increasingly serious…
Hugging Face 音訊資料集完整指南：從載入、預處理到串流處理★ 70
Hugging Face Blog1,321 days agoTutorial
With the rapid growth of voice AI (such as Whisper), efficiently handling audio datasets has become critically important. This guide from the official Hugging…
在 🤗 Transformers 中使用 n-gram 提升 Wav2Vec2 語音識別效能
Hugging Face Blog1,658 days agoTutorial
This technical blog post from Hugging Face introduces how combining n-gram language models (LMs) can significantly improve the performance of Wav2Vec2…