Offline CPU Voice Loop for Ollama and LM Studio Agents

Original: I wired a fully offline voice loop to Ollama + LM Studio — 100% CPU, no GPU, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)

A Reddit developer shared an MIT-licensed offline voice stack for local LLM agents using CPU-only VAD, STT, and TTS.

A r/LocalLLaMA post introduces an offline voice loop for talking to local models through Ollama, LM Studio, or vLLM. The stack uses Silero VAD, Parakeet TDT 0.6B v3 STT, and Supertonic TTS 3, all running on CPU so GPU memory stays available for the LLM. The author reports measured CPU-only benchmarks, agent integrations, cross-platform installers, and an MIT-licensed GitHub release.

A post on r/LocalLLaMA describes a new open-source voice interface intended to let users speak with local LLM agents without relying on cloud audio services, a dedicated GPU, or a macOS-only workflow. The author says the project was motivated by frustration with existing voice setups that either required GPU acceleration, sent audio outside the machine, or had limited operating system support. The result is a fully local voice loop that can connect to local model runtimes such as Ollama, LM Studio, or vLLM while keeping speech detection, transcription, and synthesis on the CPU.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.