r/LocalLLaMA top dayJun 7, 2026, 6:05 PM/u/Kahvana

NVFP4 Support Merged in llama.cpp: How to Use 4-bit Blackwell Quantization

Original: NVFP4 on llama.cpp?

A Reddit discussion explores how to convert and run NVFP4 (NVIDIA FP4) quantized models on llama.cpp using Blackwell GPUs.

Following the merge of native NVFP4 (NVIDIA FP4) support in llama.cpp, users are exploring how to leverage this format on Blackwell GPUs (such as the RTX 50-series). The discussion focuses on converting NVFP4 safetensors (like Gemma 4 QAT) to GGUF format and whether importance matrices (imatrix) are required. This enablement promises significant performance gains for local LLM execution on next-gen hardware.

想看英文原文 / 完整內容?

前往 r/LocalLLaMA top day 原文 →

摘要由 AI 整理,以原文為準。