Apple Announced a New On-Device Inference Engine for Apple Silicon

Original: Apple announced new on device inference engine for Apple Silicon

Apple announced CoreAI, a new on-device inference engine aimed at Apple Silicon devices.

Apple announced CoreAI at WWDC, which the post frames as a possible future replacement for CoreML and an alternative to MLX, llama.cpp, and torch for optimized on-device inference. Models still need conversion through Python scripts, and current supported models appear mostly from mid-2025. No performance data is available yet; the author expects it may trail MLX on GPU, but Apple’s 20B on-device foundation model claim suggests larger app-bundled models could become possible.

This r/LocalLLaMA post summarizes CoreAI, which Apple announced at WWDC, with the author noting that the news appears to have received relatively little attention. According to the post, CoreAI is Apple’s new on-device inference engine for Apple Silicon. It may be seen as a future replacement for CoreML, and can also be compared with local inference or machine-learning execution options such as MLX, llama.cpp, and torch. The author particularly emphasizes that its key scenarios are Apple devices such as phones and tablets, with the goal of letting models run on-device in a way that is closer to the hardware.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.