NVFP4 Support Merged in llama.cpp: How to Use 4-bit Blackwell Quantization
r/LocalLLaMA top day·22 hours ago·Commentary
Following the merge of native NVFP4 (NVIDIA FP4) support in llama.cpp, users are exploring how to leverage this format on Blackwell GPUs (such as the RTX 50-series). The discussion focuses on converting NVFP4 safetensors (like Gemma 4 QAT) to GGUF format and whether importance matrices (imatrix) are required. This enablement promises significant performance gains for local LLM execution on next-gen hardware.