Latest in AI

Showing:asrDevelopersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Mistral AI Launches Voxtral: Audio Speech and Understanding Model
Mistral AI News40 days agoRelease
Mistral AI has announced Voxtral, its debut audio-native language model family targeting speech recognition, multilingual transcription, and audio comprehension. Available in two sizes via Mistral's La Plateforme API, it extends the company's portfolio decisively into multimodal AI. The release positions Mistral as a full-stack AI provider capable of handling voice and audio alongside its established text and code capabilities.
Voxtral Transcribes at the Speed of Sound
Mistral AI News40 days agoRelease
Mistral AI has unveiled Voxtral, its speech transcription model built around near-real-time processing speed. The announcement, framed as a research release, positions Voxtral as a competitive alternative in the automatic speech recognition (ASR) space. The "speed of sound" framing suggests the model's key differentiator is low-latency, fast transcription suitable for demanding production workloads.
Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech
Hugging Face Blog48 days agoBenchmark
Code-switching—where bilingual speakers blend two languages in a single utterance—is common in markets like Taiwan, Singapore, and India, yet most ASR benchmarks focus on monolingual audio. ServiceNow AI evaluates frontier speech recognition models specifically on this mixed-language scenario. The findings help enterprise teams make informed ASR model choices when deploying voice agents for multilingual customer-facing applications.
Voxtral★ 78
Mistral AI News50 days agoRelease
Mistral AI introduces Voxtral, a speech understanding model family with 24B and 3B variants under Apache 2.0. The models support long-context transcription, audio Q&A, summarization, multilingual detection, and function calling from voice. Mistral says Voxtral is competitive across transcription and audio understanding benchmarks, with API access starting at $0.001 per minute and local downloads available on Hugging Face.
Dockerized Nemotron 3.5 ASR: Better Multilingual Support & Streaming (4.5x CPU Speed)
r/LocalLLaMA top day51 days agoNew Tool
A developer on Reddit shared a Dockerized implementation of Nemotron 3.5 ASR, migrating from Parakeet. The system supports over 40 languages and features a native streaming architecture that avoids full-file buffering. Using the onnxruntime-genai backend, it achieves 4.5x real-time speed on CPU, with CUDA support planned but untested.
How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent
Hugging Face Blog54 days agoTutorial
This Hugging Face Blog post appears to be a practical tutorial for fine-tuning NVIDIA Nemotron 3.5 ASR. Based on the title, it focuses on adapting speech recognition to a target language, specialized domain, or accent. The original text was not provided, so implementation details, datasets, commands, metrics, and hardware requirements cannot be confirmed.
Hugging Face 為 Open ASR 排行榜引入「防刷榜機制」，使用私有測試數據打擊 Benchmaxxer★ 75
Hugging Face Blog83 days agoRelease
Hugging Face has recently made a major update to its popular Open ASR (Automatic Speech Recognition) leaderboard, aimed at combating the increasingly serious…
Hugging Face 推出 Open ASR Leaderboard 新賽道：聚焦多語言與長音訊語音辨識趨勢★ 75
Hugging Face Blog249 days agoRelease
Hugging Face recently made a major upgrade to its flagship "Open ASR Leaderboard," officially launching two brand-new evaluation tracks: "Multilingual" and…
使用 Hugging Face Inference Endpoints 實現高效能 ASR、語者辨識與投機解碼★ 75
Hugging Face Blog818 days agoTutorial
This technical blog post from Hugging Face introduces how to build a powerful and efficient speech processing system using Hugging Face Inference Endpoints — a…
使用 🤗 Transformers 微調 W2V2-BERT 以進行低資源語音辨識 (ASR)★ 75
Hugging Face Blog921 days agoTutorial
This technical blog post from Hugging Face provides a detailed walkthrough of how to use the `transformers` library to fine-tune Meta's open-source W2V2-BERT…
微調 MMS Adapter 模型：為低資源語言打造專屬語音辨識 (ASR)★ 70
Hugging Face Blog1,135 days agoTutorial
Meta's MMS (Massively Multilingual Speech) project, released in 2023, extends speech technology to over 1,000 languages, covering automatic speech recognition…
在 Unity 中實現 AI 語音辨識：利用 Hugging Face API 輕鬆整合 Whisper 模型
Hugging Face Blog1,152 days agoTutorial
This official Hugging Face blog post details how to quickly implement AI speech recognition (Automatic Speech Recognition, ASR) functionality in the Unity game…
Microsoft SpeechT5 登陸 Hugging Face：語音合成、辨識與轉換的多功能統一模型★ 75
Hugging Face Blog1,266 days agoRelease
Microsoft's SpeechT5 model has been officially integrated into Hugging Face's Transformers library. This represents a significant advancement in the field of…
使用 🤗 Transformers 微調 Whisper 進行多語言語音辨識 (ASR)★ 80
Hugging Face Blog1,363 days agoTutorial
OpenAI's Whisper is a powerful automatic speech recognition (ASR) model. While its zero-shot capabilities are impressive, there remains significant room for…
在 🤗 Transformers 中使用 Wav2Vec2 處理超長音檔的自動語音辨識 (ASR)
Hugging Face Blog1,638 days agoTutorial
In the field of automatic speech recognition (ASR), Wav2Vec2 is a revolutionary model, but it faces a significant challenge when processing long audio files…
在 🤗 Transformers 中使用 n-gram 提升 Wav2Vec2 語音識別效能
Hugging Face Blog1,658 days agoTutorial
This technical blog post from Hugging Face introduces how combining n-gram language models (LMs) can significantly improve the performance of Wav2Vec2…
使用 🤗 Transformers 微調 XLSR-Wav2Vec2 以進行低資源語音辨識 (ASR)
Hugging Face Blog1,716 days agoTutorial
Automatic speech recognition (ASR) has achieved remarkable success for resource-rich languages such as English and standard Mandarin, but building…
使用 🤗 Transformers 在 Hugging Face 中微調 Wav2Vec2 進行英文語音辨識 (ASR)★ 70
Hugging Face Blog1,964 days agoTutorial
This is a landmark technical tutorial published by the Hugging Face team in 2021, detailing how to fine-tune Meta AI's Wav2Vec2 model using the Hugging Face…

Latest in AI

Mistral AI Launches Voxtral: Audio Speech and Understanding Model

Voxtral Transcribes at the Speed of Sound

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Voxtral★ 78

Dockerized Nemotron 3.5 ASR: Better Multilingual Support & Streaming (4.5x CPU Speed)

How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent

Hugging Face 為 Open ASR 排行榜引入「防刷榜機制」，使用私有測試數據打擊 Benchmaxxer★ 75

Hugging Face 推出 Open ASR Leaderboard 新賽道：聚焦多語言與長音訊語音辨識趨勢★ 75

使用 Hugging Face Inference Endpoints 實現高效能 ASR、語者辨識與投機解碼★ 75

使用 🤗 Transformers 微調 W2V2-BERT 以進行低資源語音辨識 (ASR)★ 75

微調 MMS Adapter 模型：為低資源語言打造專屬語音辨識 (ASR)★ 70

在 Unity 中實現 AI 語音辨識：利用 Hugging Face API 輕鬆整合 Whisper 模型

Microsoft SpeechT5 登陸 Hugging Face：語音合成、辨識與轉換的多功能統一模型★ 75

使用 🤗 Transformers 微調 Whisper 進行多語言語音辨識 (ASR)★ 80

在 🤗 Transformers 中使用 Wav2Vec2 處理超長音檔的自動語音辨識 (ASR)

在 🤗 Transformers 中使用 n-gram 提升 Wav2Vec2 語音識別效能

使用 🤗 Transformers 微調 XLSR-Wav2Vec2 以進行低資源語音辨識 (ASR)

使用 🤗 Transformers 在 Hugging Face 中微調 Wav2Vec2 進行英文語音辨識 (ASR)★ 70