OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
r/LocalLLaMA top day·21 hours ago·Paper
OSCAR applies offline-precomputed rotation matrices—derived from spectral covariance analysis—to reshape KV tensor distributions before 2-bit quantization, suppressing outliers and reducing rounding error. The rotation adds negligible inference overhead since it requires no runtime learning. GGUF downloads for Gemma-4-12B-it, Qwen3-32B, and Qwen3-4B-Thinking are available, with llama.cpp and sglang integrations and an arXiv paper.