Voxtral TTS: Open-Weights, Low-Latency Text-to-Speech from Mistral AI
Original: Research Speaking of Voxtral Voxtral TTS: A frontier, open-weights text-to-speech model that’s fast, instantly adaptable, and produces lifelike speech for voice agents. March 23, 2026 Mistral AI
Mistral AI released Voxtral TTS, a 4B multilingual, low-latency voice model for voice agents.
Mistral AI introduced Voxtral TTS, its first text-to-speech model, focused on realistic multilingual voice generation. The 4B-parameter model supports nine languages, quick voice adaptation from short references, and low-latency streaming for voice agents. Mistral says human evaluations show stronger naturalness than ElevenLabs Flash v2.5, with API access, Studio testing, Le Chat access, and open weights on Hugging Face.
想看英文原文 / 完整內容?
前往 Mistral AI News 原文 →相關
摘要由 AI 整理,以原文為準。