Latest in AI

Showing:latencyResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Voxtral Transcribes at the Speed of Sound
Mistral AI News40 days agoRelease
Mistral AI has unveiled Voxtral, its speech transcription model built around near-real-time processing speed. The announcement, framed as a research release, positions Voxtral as a competitive alternative in the automatic speech recognition (ASR) space. The "speed of sound" framing suggests the model's key differentiator is low-latency, fast transcription suitable for demanding production workloads.
DiffusionGemma: 4x Faster Text Generation★ 76
Hacker News (AI keywords)47 days agoRelease
Google released DiffusionGemma, a 26B MoE experimental open model using text diffusion instead of token-by-token autoregressive decoding. It can generate blocks of text in parallel, reaching up to 4x faster output on dedicated GPUs. The model targets local, speed-sensitive workflows, but Google says its output quality is below standard Gemma 4 and recommends Gemma 4 for quality-critical production use.
Real-Time LLM Inference on Standard GPUs at 3k Tokens/s per Request
Hacker News (AI keywords)60 days agoBenchmark
The post’s title indicates a performance claim for real-time LLM inference on standard GPUs, reporting 3,000 tokens per second per request. No article body is available, so the underlying model, GPU type, batch size, latency profile, precision, serving stack, and benchmark method are not stated. The item is best treated as an inference-performance benchmark claim rather than a verified deployment guide.
Vercel 實戰經驗：為什麼我們刪除了 AI Agent 80% 的工具？★ 85
Vercel Changelog218 days agoOpinion
When building AI applications, developers often fall into the trap of "more tools equals a smarter Agent." In early versions of Vercel's AI assistants and…
評測 Text Generation Inference (TGI)：如何量化與優化大語言模型推理性能★ 75
Hugging Face Blog790 days agoTutorial
This official Hugging Face blog post takes an in-depth look at how to benchmark Text Generation Inference (TGI), Hugging Face's open-source LLM inference and…
Hugging Face 聯手 Artificial Analysis 推出 LLM 效能與成本排行榜★ 75
Hugging Face Blog816 days agoNew Tool
Hugging Face has announced a partnership with the independent AI performance analytics firm Artificial Analysis, officially integrating its "LLM Performance…
使用 ONNX Runtime 加速超過 130,000 個 Hugging Face 模型★ 75
Hugging Face Blog1,028 days agoNew Tool
Hugging Face officially announced a deep collaboration with Microsoft to integrate ONNX Runtime (ORT) into the Hugging Face ecosystem. This partnership enables…
Fetch 採用 Amazon SageMaker 與 Hugging Face，成功降低 50% 機器學習處理延遲
Hugging Face Blog1,061 days agoBusiness
This case study examines how Fetch, a leading consumer rewards platform in the United States, leveraged the collaboration between Amazon SageMaker and Hugging…
Hugging Face 推出 Assisted Generation：邁向低延遲文本生成的新方向★ 85
Hugging Face Blog1,174 days agoRelease
Large language models (LLMs) typically generate text using an "autoregressive" mechanism, meaning the model must generate one token at a time. Each generation…
案例研究：使用 Hugging Face Infinity 與現代 CPU 實現毫秒級延遲
Hugging Face Blog1,657 days agoNew Tool
This case study focuses on the performance of "Hugging Face Infinity" — Hugging Face's high-performance inference container solution — on modern CPUs…