Latest in AI

Showing:local-inferenceGeneralClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

LocalLLaMA User Weighs QAT Gemma 31B GGUF Quants for RTX 3060
r/LocalLLaMA top day47 days agoCommentary
A Reddit user with an RTX 3060 12GB and 32GB DDR3 RAM is evaluating new QAT-based Gemma 31B GGUF quantizations. They currently run an older Unsloth Gemma 31B IQ3_XXS build at long context, with some tensor and mmproj offloading to CPU. The post asks which Q2-Q3 quant to choose, whether QAT changes quality expectations, and whether MTP would help or hurt under tight VRAM limits.
Windows 11 tops 1B users as Microsoft previews RTX Spark AI PCs★ 72
INSIDE 硬塞 AI55 days agoHardware
Microsoft announced at Computex 2026 that Windows 11 has surpassed one billion users, framing the milestone as a base for its next PC strategy. This fall, AI laptops powered by NVIDIA RTX Spark are expected to arrive, emphasizing local inference. Microsoft also plans broader mainstream hardware upgrades to prepare Windows PCs for future AI agent workflows.
Reachy Mini goes fully local
Hugging Face Blog62 days agoHardware
Hugging Face published a tutorial for running Reachy Mini conversations without cloud audio processing or API keys. The setup uses its speech-to-speech library as a cascaded VAD, STT, LLM, and TTS pipeline exposed through a Realtime API-compatible WebSocket. Recommended defaults include llama.cpp with Gemma 4, Silero VAD, Parakeet-TDT, and Qwen3-TTS, while allowing swaps to vLLM, MLX, Transformers, or hosted Responses API providers.