Benchmarking Google Eloquent Exposes Major On-Device Dictation Reliability Issues

Original: Tried to benchmark Google’s new on-device dictation models (Eloquent) and basically couldn’t

A Reddit tester says Google Eloquent often drops large parts of dictations, making clean benchmarking difficult despite competitive accuracy on complete outputs.

A LocalLLaMA user tried to benchmark Google’s new fully local dictation app, Eloquent, against open ASR models such as Qwen3-ASR and NVIDIA Parakeet V3. The tester reported that roughly half of dictations returned only fragments, even during manual use. When Eloquent produced complete transcripts, its word error rate was competitive, but the missing-output behavior made the app unreliable for evaluation and practical use.

A Reddit post in r/LocalLLaMA reports an attempted benchmark of Google’s new on-device dictation app, Eloquent, and concludes that the app was too unreliable to evaluate cleanly. The author says Google had shipped a fully local dictation app using proprietary new models, which prompted them to compare it with leading open speech-recognition models such as Qwen3-ASR and NVIDIA Parakeet V3. The benchmark was intended to use the author’s existing evaluation setup: a harness that plays audio clips through a virtual input device into a dictation app, captures the app’s pasted output, and compares results across the same clips. The author also says they have around 1,500 manually corrected clips from their own daily engineering work.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.