Noiz AI, HKUST & Tsinghua Open-Source Audio Generation Model: 4 Steps, 0.24s on One GPU

Original: 4步出声，单卡0.24秒！Noiz AI联合港科大清华，开源音频生成大模型

Noiz AI, HKUST, and Tsinghua open-source an audio generation model that runs in 4 steps and 0.24 seconds on a single GPU.

Noiz AI has partnered with Hong Kong University of Science and Technology (HKUST) and Tsinghua University to open-source a large audio generation model. The model's standout claims are efficiency: just four sampling steps to produce audio, with inference completing in 0.24 seconds on a single GPU. The open-source release brings research-grade, low-latency audio synthesis within reach of developers and researchers globally.

Noiz AI, in partnership with researchers from the Hong Kong University of Science and Technology (HKUST) and Tsinghua University, has open-sourced a large-scale audio generation model that the team claims can produce audio in just four generation steps, completing inference in 0.24 seconds on a single GPU. The announcement, reported by QbitAI on June 15, 2026, highlights two headline-level efficiency metrics that position this model as a notable contribution to the rapidly advancing field of generative audio.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.