Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super

Original: 1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

A Reddit benchmark tests Bonsai LM 1-bit and 1.58-bit models on Jetson Orin Nano Super across power modes.

A LocalLLaMA post benchmarks five Bonsai LM models, from 1.7B to about 8B parameters, on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA. The tests compare 7W, 15W, 25W, and MAXN modes across latency, throughput, energy per token, and thermals. The main takeaway is that 25W is usually the best efficiency/performance point for models up to 4B, while Bonsai-8B may favor 15W for lower power.

A Reddit post in r/LocalLLaMA summarizes a benchmark of five Bonsai LM models, covering both 1-bit and 1.58-bit variants from 1.7B parameters up to roughly 8B, running on an NVIDIA Jetson Orin Nano Super 8GB. The author used llama.cpp with CUDA and tested the device across all four power modes: 7W, 15W, 25W, and MAXN. The stated goal was to measure practical edge-inference behavior, including time to first token, tokens per second, tokens per joule, overall request latency, and thermal stability, while exploring the appeal of very low-bit models with small memory footprints.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.