Latest in AI

Showing:voiceResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Mistral AI Launches Voxtral: Audio Speech and Understanding Model
Mistral AI News40 days agoRelease
Mistral AI has announced Voxtral, its debut audio-native language model family targeting speech recognition, multilingual transcription, and audio comprehension. Available in two sizes via Mistral's La Plateforme API, it extends the company's portfolio decisively into multimodal AI. The release positions Mistral as a full-stack AI provider capable of handling voice and audio alongside its established text and code capabilities.
Thinking Machines 推出原生互動模型 TML-Interaction-Small 276B-A12B：突破即時語音 SOTA 並淘汰傳統 VAD★ 85
Latent Space77 days agoRelease
According to AINews, the AI research team Thinking Machines (affectionately nicknamed "Team Thinky" by the community) has recently unveiled a new native…
OpenAI 推出 GPT-Realtime-2、GPT-Translate 與 GPT-Whisper：全新 SOTA 即時語音 API★ 85
Latent Space81 days agoRelease
OpenAI has continued to expand the reach of its GPT-5 technology, officially launching three new voice and audio APIs: GPT-Realtime-2, GPT-Translate, and…
Google DeepMind 推出 Gemini 3.1 Flash Live：讓語音 AI 更自然、更可靠★ 85
Google DeepMind Blog124 days agoRelease
Google DeepMind has officially unveiled its latest voice model, "Gemini 3.1 Flash Live." This model is positioned to deliver lower-latency, higher-precision…
EVA：ServiceNow AI 推出全新語音 Agent 評估框架★ 75
Hugging Face Blog126 days agoRelease
With the proliferation of GPT-4o, Gemini Live, and various end-to-end voice models, Voice Agents have become an important frontier in AI applications. However…
Hugging Face 推出 Voice Consent Gate：為語音複製建立安全授權機制★ 75
Hugging Face Blog273 days agoNew Tool
With the rapid advancement of voice cloning technology, generating hyper-realistic synthetic voices has become remarkably easy. However, this has also…
在 Hugging Face 上部署語音對語音 (Speech-to-Speech) 模型★ 75
Hugging Face Blog644 days agoTutorial
As real-time voice interaction technologies like GPT-4o become more widespread, the open-source community is also actively developing speech-to-speech (S2S)…
使用 Hugging Face Inference Endpoints 實現高效能 ASR、語者辨識與投機解碼★ 75
Hugging Face Blog818 days agoTutorial
This technical blog post from Hugging Face introduces how to build a powerful and efficient speech processing system using Hugging Face Inference Endpoints — a…