r/LocalLLaMA top dayJun 9, 2026, 7:00 PM/u/pmttyji

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

OSCAR uses precomputed spectral covariance-aware rotation matrices to enable 2-bit KV cache quantization with minimal quality loss.

OSCAR applies offline-precomputed rotation matrices—derived from spectral covariance analysis—to reshape KV tensor distributions before 2-bit quantization, suppressing outliers and reducing rounding error. The rotation adds negligible inference overhead since it requires no runtime learning. GGUF downloads for Gemma-4-12B-it, Qwen3-32B, and Qwen3-4B-Thinking are available, with llama.cpp and sglang integrations and an arXiv paper.

OSCAR (Offline Spectral Covariance-Aware Rotation) is a KV Cache quantization technology optimized for inference memory in large language models (LLMs). Its core goal is to compress the Key-Value cache in the attention mechanism to 2-bit precision while preserving model output quality as much as possible.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on r/LocalLLaMA top day →

gemini qwen open-source llama-cpp sglang #kv-cache #quantization #local-llm #inference-optimization #2-bit

Summaries are AI-generated; the original article is authoritative.