Jetson Orin NX Build for Hermes Agent + Benchmarking

A Jetson Orin NX was rebuilt as a quiet local LLM box for Hermes Agent benchmarking.

The post describes turning an unused Jetson Orin NX into a compact local LLM server for Hermes Agent testing. The goals were low noise, over 10 tok/s generation, 300 tok/s prompt processing, at least 65K context, and a custom case. After testing Gemma 4, Qwen 3.6, and many quant variants, the author reports Gemma 4 26B A4B UD Q2_K_XL reaching 66K context and 10.21 tok/s near 60K context.

This r/LocalLLaMA post is a hands-on local LLM testing write-up focused on implementation and hardware modding. The author originally had a large LLM server, then repurposed a long-idle Jetson Orin NX from a discontinued robotics project. The board originally belonged to the Llama-7B era, but the author believes recent progress in MoE architectures and smaller models has made this kind of low-power edge hardware worth testing again, so they tried turning it into a compact local inference host capable of running agent workloads.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.