Latest in AI

Showing:vram-optimizationDevelopersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax★ 72
r/LocalLLaMA top day49 days agoNew Tool
Luce Spark is an open-source MoE offload system for running 33B-35B A3B models on 16GB-class GPUs. It keeps frequently routed experts on GPU, stores the long tail in system RAM, and swaps cold experts through a bounded async cache. The author reports 13.3 GiB for Qwen3.6 35B-A3B and about 100 tok/s with Spark optimizations, but notes real 16GB GPU testing is still missing.
Hugging Face Diffusers 量化後端深度探索：在消費級 GPU 高效運行大型擴散模型★ 80
Hugging Face Blog433 days agoTutorial
As diffusion models (such as Flux.1 and Stable Diffusion 3) continue to grow in parameter count — often reaching tens of billions or even hundreds of billions…
Hugging Face 推出 Remote VAE 功能：優化 Inference Endpoints 的圖像解碼與 VRAM 佔用★ 75
Hugging Face Blog519 days agoRelease
In the generative AI domain, latent diffusion models (such as Stable Diffusion, FLUX.1, etc.) operate in two main stages: first, denoising and generation take…
使用 Quanto 與 Diffusers 打造記憶體高效的 Diffusion Transformers (DiT)★ 80
Hugging Face Blog728 days agoRelease
### Background and Challenges As generative AI technology evolves, image and video generation models are increasingly transitioning from traditional UNet…
解鎖更長的文本生成：深入探討 Key-Value (KV) 快取量化技術★ 80
Hugging Face Blog803 days agoTutorial
During the inference process of large language models (LLMs), the self-attention mechanism needs to store the Key and Value vectors of historical tokens (i.e…
在免費版 Google Colab 上使用 🧨 diffusers 運行 DeepFloyd IF 模型
Hugging Face Blog1,189 days agoTutorial
### Core Background and Challenges DeepFloyd IF is an advanced text-to-image model released by DeepFloyd, a research lab under Stability AI. Unlike the…
優化故事：BLOOM 超大模型推理優化實踐
Hugging Face Blog1,385 days agoTutorial
This technical blog post from Hugging Face documents in detail the practical process of optimizing inference for BLOOM, the open-source multilingual large…