r/LocalLLaMA top dayJun 9, 2026, 1:11 PM/u/maximecb

Rust-native CPU-only LFM2.5-8B-A1B inference library "bebelm" published as cargo crate

Original: I put together a Rust-native, CPU-only implementation of LFM2.5-8B-A1B

Developer releases bebelm, a Rust-native CPU-only inference library for LFM2.5-8B-A1B running at ~37 tokens/s on a Ryzen 7950x with ~7GB RAM.

Community developer maximecb has published bebelm, a Rust-native, GPU-free inference implementation of Liquid AI's LFM2.5-8B-A1B model, available on crates.io. Decode speed reaches ~37 tokens/s on a Ryzen 7950x with ~7GB memory footprint; prefill is unoptimized and currently similar in speed to decode. The library supports tool-use callbacks, weight sharing across multiple Agent instances with independent KV caches, and Agent cloning to skip repeated prefill on shared prompts.

This post comes from Reddit r/LocalLLaMA, where author maximecb shares bebelm, a local inference library for the LFM2.5-8B-A1B model written entirely in Rust from scratch, published to crates.io so that Rust developers can include it directly via Cargo.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on r/LocalLLaMA top day →

other bebelm #local-llm #rust #cpu-inference #lfm #multi-agent

Summaries are AI-generated; the original article is authoritative.