Latest in AI

Showing:speech-synthesisResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Noiz AI, HKUST & Tsinghua Open-Source Audio Generation Model: 4 Steps, 0.24s on One GPU
量子位 QbitAI43 days agoPaper
Noiz AI has partnered with Hong Kong University of Science and Technology (HKUST) and Tsinghua University to open-source a large audio generation model. The model's standout claims are efficiency: just four sampling steps to produce audio, with inference completing in 0.24 seconds on a single GPU. The open-source release brings research-grade, low-latency audio synthesis within reach of developers and researchers globally.
TTS Benchmark Revamped with Objective Standards and Blind ELO Voting (46 Models)
r/LocalLLaMA top day48 days agoBenchmark
Reddit user UkieTechie has revamped their TTS benchmark platform with objective scoring standards and live blind voting, now covering 46 speech synthesis models. Hosted on Hugging Face Space, the arena lets users vote on audio quality without knowing the model name, generating a dynamic ELO leaderboard. The project is open-source on GitHub and welcomes community submissions of new models.
Best Local TTS Solution
r/LocalLLaMA top day50 days agoCommentary
A r/LocalLLaMA user says they have tested many local TTS tools, but none match ElevenLabs for expressiveness, voices, and cloning. They list moss-nano and Kokoro as the best edge-device candidates so far, with edgeTTS as a free/cloud option. The post asks for community experience connecting agents such as Hermes, openclaw, or opencode to Telegram voice notes or real-time voice conversations.
Microsoft SpeechT5 登陸 Hugging Face：語音合成、辨識與轉換的多功能統一模型★ 75
Hugging Face Blog1,266 days agoRelease
Microsoft's SpeechT5 model has been officially integrated into Hugging Face's Transformers library. This represents a significant advancement in the field of…