Mistral AI introduces Voxtral, a speech understanding model family with 24B and 3B variants under Apache 2.0. The models support long-context transcription, audio Q&A, summarization, multilingual detection, and function calling from voice. Mistral says Voxtral is competitive across transcription and audio understanding benchmarks, with API access starting at $0.001 per minute and local downloads available on Hugging Face.
Mistral AI introduced Voxtral TTS, its first text-to-speech model, targeting natural multilingual voice generation across nine languages. The 4B-parameter model supports voice adaptation from short references, emotional expressiveness, dialect handling, and low-latency streaming. It is available through API, Mistral Studio, and Le Chat, with open weights on Hugging Face under a non-commercial CC BY NC 4.0 license.