Packed twin inference doubles Qwen3.6-27B throughput on one MI50
r/LocalLLaMA top day·15 hours ago·Benchmark
A LocalLLaMA user shared an early packed-twin-inference experiment for local LLM acceleration.
The idea resembles speculative decoding, but uses the same quantized model side-by-side instead of a smaller draft model.
On a single AMD MI50, the author reports Qwen3.6-27B improving from 19.4 to 38.1 tk/s, with Q8-or-lower quantization as the main target.