Latest in AI

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Packed twin inference doubles Qwen3.6-27B throughput on one MI50
r/LocalLLaMA top day50 days agoBenchmark
A LocalLLaMA user shared an early packed-twin-inference experiment for local LLM acceleration. The idea resembles speculative decoding, but uses the same quantized model side-by-side instead of a smaller draft model. On a single AMD MI50, the author reports Qwen3.6-27B improving from 19.4 to 38.1 tk/s, with Q8-or-lower quantization as the main target.
JetBrains Mellum 2: a really good and performant model
r/LocalLLaMA top day50 days agoBenchmark
A r/LocalLLaMA user shared informal impressions of JetBrains Mellum 2, focusing on local coding-style tasks and tool calls. On an AMD Radeon RX 7900 XT with llama.cpp Vulkan and 131K context, the model reportedly generated around 111 tokens/s and stayed above 100 tokens/s near full context. The author stresses this is not a scientific benchmark, but a practical workflow-oriented test.
Mercor’s Brendan Foody calls out Sequoia over dual-pricing valuation tricks
TechCrunch AI50 days agoBusiness
TechCrunch reports that Mercor’s Brendan Foody called out Sequoia over alleged dual-pricing valuation practices. The article says Sequoia is one of several top firms that sell the same equity at two different prices. The story centers on transparency, valuation signaling, and how AI startup equity may be priced in venture markets.
Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72
r/LocalLLaMA top day50 days agoRelease
Omi Health’s founder says he fine-tuned NVIDIA Parakeet TDT 0.6B v2 for clinical speech and released Omi Med STT v1 under CC-BY-4.0. The runtime supports Mac, Windows, and Linux, auto-selecting MLX, NeMo, or GGUF/parakeet.cpp backends. In the author’s held-out medical benchmark, it reports 2.37% medical-WER and 145× realtime on local A10 compute.
A llama.cpp CLI Command Builder
r/LocalLLaMA top day50 days agoNew Tool
A r/LocalLLaMA post introduces a llama.cpp CLI Command Builder with no accounts, email, pop-ups, cookies, or ads. It stores information locally in the browser and includes editable fields for flags and arguments found in the documentation. Users can build CLI or server commands, log run information, and compare which configurations work best for their hardware; only Linux is currently supported.
Domain Search is now available through the Vercel CLI
Vercel Changelog50 days agoRelease
Vercel has added domain search functionality to its CLI, enabling developers to query domain availability directly from the command line. Previously, this required switching to the Vercel web dashboard, adding friction to deployment workflows. The update keeps more actions within the terminal, reducing context-switching for keyboard-driven developers.
Migrating Your GitHub CI to Hugging Face Jobs
Hugging Face Blog50 days agoTutorial
Hugging Face has released an official guide for developers looking to move their GitHub CI pipelines to Hugging Face Jobs, a compute service designed for ML workloads. The platform offers GPU-ready infrastructure that sits closer to models and datasets on the Hub, reducing latency and transfer costs compared to generic GitHub Actions runners. The tutorial covers workflow translation, authentication, resource configuration, and status reporting back to GitHub PRs.
Vercel Connect: Secure Access to External Services for Your Agents
Vercel Changelog50 days agoNew Tool
Vercel has introduced Connect, a new capability that lets AI agents running on its platform securely reach external services and APIs. The feature addresses one of agentic deployment's sharpest pain points: safely brokering credentials and connections to outside tools without exposing secrets in application code. With Connect, Vercel extends its platform role from web-app host to managed infrastructure layer for production AI agents.
How Code and Theory Cut Time-to-Prototype 75% with v0
Vercel Changelog50 days agoBusiness
Code and Theory, a digital experience agency, achieved a 75% reduction in time-to-prototype by integrating Vercel's AI-powered UI generation tool v0 into their design and development workflow. The case study, published by Vercel, highlights how generative UI tooling can dramatically compress early-stage product iteration cycles. It positions v0 as a practical accelerant for agencies balancing client speed expectations with design quality.
How Fern Runs Multi-Tenant Docs for Webflow and ElevenLabs on Vercel
Vercel Changelog50 days agoBusiness
Fern, a developer documentation platform, leverages Vercel to power white-labeled, multi-tenant documentation sites for enterprise API clients including Webflow and ElevenLabs. The case study highlights how Fern isolates per-tenant content and deployments at scale on Vercel's edge network. This architecture lets Fern onboard new documentation customers without provisioning separate infrastructure for each.
Budgets for API Keys on Vercel AI Gateway
Vercel Changelog50 days agoRelease
Vercel has added per-API-key budget controls to its AI Gateway product, enabling developers to set hard spending limits on individual keys. Once a key hits its budget threshold, the gateway automatically blocks further requests, preventing unexpected cost overruns. This is especially useful for multi-tenant apps, team cost allocation, and isolating dev/test environments from production spending.
Pipeline parallelism in llama.cpp may be wasting your VRAM
r/LocalLLaMA top day50 days agoBenchmark
The author compared three llama.cpp Vulkan builds: default 4 sched copies, 1 sched copy, and no pipeline parallelism. In their Qwen GGUF test, input and output throughput were nearly identical across all configurations. However, the default setting used about 1.5GB more VRAM for compute buffers and reduced usable context from roughly 113K tokens to around 88K, though parallel-request benefits were not tested.
Siri AI at WWDC 2026★ 72
Simon Willison's Weblog50 days agoCommentary
Simon Willison says Apple’s 2024 Apple Intelligence rollout made him cautious, so he will believe the WWDC 2026 Siri AI claims only after seeing results. He notes the new features look more feasible, especially with a custom Gemini-derived model running on Private Cloud Compute. He also highlights vision LLM screen understanding and the new Core AI library for running PyTorch-derived models on Apple hardware.
Tools for Humanity reportedly plans layoffs amid revenue struggles
TechCrunch AI50 days agoBusiness
TechCrunch reports that Tools for Humanity, Sam Altman’s identity verification company, is struggling to generate revenue and will downsize its staff. The original text does not specify how many employees are affected, which teams are involved, or any financial figures. The story matters mainly as a business signal around AI-adjacent identity verification and the difficulty of turning high-profile technology narratives into durable revenue.
Apple’s WWDC AI demos looked more real after $250M false ad settlement
TechCrunch AI50 days agoCommentary
TechCrunch notes that Apple’s WWDC 2026 AI demos felt more concrete and realistic, often showing people holding iPhones in use-case scenarios. The framing matters after Apple’s $250 million settlement over allegedly misleading Siri and Apple Intelligence advertising. The piece focuses less on model breakthroughs and more on Apple’s shift toward demos that look deliverable, usable, and legally safer.
Apple is using AI to fix Safari’s extension problem
The Verge AI50 days agoRelease
Apple is trying to address Safari’s weaker extension ecosystem with AI. Safari has long lagged behind rival browsers in extension availability, partly because of Apple’s stricter development requirements. In a demo shared by Apple, the company showed users effectively “vibe coding” their own Safari extensions, though the excerpt does not detail model support, review flow, or release timing.
Show HN: Command Center, the AI coding env for people who care about quality
Hacker News (AI keywords)50 days agoNew Tool
Command Center (cc.dev) launched on Hacker News as an AI coding environment tailored for developers who value code quality over sheer volume. It aims to address common pitfalls of AI code generation, such as bloat and technical debt, by offering precise context control. The tool targets professional software engineers seeking a more reliable and high-quality AI-assisted workflow.
Quick note on recent QAT issues
r/LocalLLaMA top day50 days agoCommentary
The post argues that recent Google QAT quantization has several implementation problems, including token embeddings being quantized to q6k instead of using a pure mode. It also claims llama-quantize has a hardcoded parameter that mismatches some optimized groups, and that 32-block groups are misaligned. The author recommends Unsloth UD Q4_K_XL as a temporary option and says they are working on a patch.
OpenAI files for IPO, following Anthropic★ 74
The Verge AI50 days agoBusiness
OpenAI announced Monday that it confidentially submitted a Form S-1 with the US Securities and Exchange Commission. The move follows Anthropic, which reportedly made the same filing step on June 1. The Verge frames this as part of an IPO race between the two AI rivals, but the report does not provide timing, valuation, or offering details.
Following Anthropic, OpenAI files confidentially for IPO★ 78
TechCrunch AI50 days agoBusiness
OpenAI said Monday in a blog post that it has confidentially filed for an initial public offering. The move comes a little over a week after Anthropic, its main rival, also filed to go public. TechCrunch notes that OpenAI was last valued at $852 billion post-money, making the filing a major marker in the AI sector’s race toward public markets.
Apple plays catch-up at WWDC
TechCrunch AI50 days agoCommentary
Apple spent much of its WWDC keynote on fixes, performance improvements, and long-requested features before unveiling an upgraded AI-powered Siri. The sequencing suggests Apple wants users to see AI as one piece of a larger software-improvement effort. TechCrunch frames the event as Apple playing catch-up, rather than leading with AI as the sole headline.
Developer Runs Half-Life at 30 FPS on a 2007 Nokia N95
Hacker News (AI keywords)50 days agoHardware
A developer reportedly managed to run Half-Life at 30 FPS on a Nokia N95, a smartphone originally released in 2007. Based on the title alone, the item appears to be a retro hardware and gaming-porting story rather than an AI development. The main significance is technical novelty: demonstrating an old mobile device handling a classic PC game at a playable frame rate.
Apple bets cheaper AI will woo small developers
TechCrunch AI50 days agoBusiness
Apple is trying to make AI experimentation cheaper for smaller developers. According to TechCrunch, developers with fewer than 2 million first-time App Store downloads will have cloud API costs waived. The report frames this as a way to attract smaller teams as AI development and experimentation become increasingly expensive.
llama.cpp PR adds MTP support for Gemma-4 E2B and E4B assistants
r/LocalLLaMA top day50 days agoRelease
The Reddit post links to ggml-org/llama.cpp Pull Request #24282, which adds MTP support for Gemma-4 E2B and E4B assistants. The submitter frames it as useful for tiny Gemma models on phones, low-end machines, Raspberry Pi, or similarly constrained devices. The post does not include benchmarks, merge status, or setup instructions, so it should be treated as a development signal rather than a finished release.
Introducing FrontierCode★ 78
Hacker News (AI keywords)50 days agoBenchmark
Cognition launched FrontierCode, a coding benchmark focused on mergeability rather than only functional correctness. It evaluates correctness, tests, scope discipline, style, and repository-specific quality standards. Built with open-source maintainers and extensive quality control, it shows current frontier models still struggle: Claude Opus 4.8 scores 13.4% on the hardest Diamond subset, ahead of GPT-5.5 and Gemini 3.1 Pro.
Arguing with an AI bot posting outdated Llama 3.1 takes
r/LocalLLaMA top day50 days agoCommentary
A r/LocalLLaMA post jokes about arguing with an AI bot that posted outdated commentary involving Llama 3.1. The author says such bots should enable web search instead of relying on stale knowledge. The post also mocks exaggerated model testimonial posts, using Qwen3.6 27B as a sarcastic example, making it more of a community quality complaint than technical news.
Qwen3.6-35B-A3B Tool Calling Benchmark: ByteShape vs Unsloth GGUFs
r/LocalLLaMA top day50 days agoBenchmark
The post benchmarks eight Qwen3.6-35B-A3B GGUF quants from ByteShape and Unsloth using llama.cpp and tool-eval-bench. It compares f16, q8_0, and q4_0 KV cache quantization under short and long-context pressure, totaling 144 runs and roughly 300 GPU-hours. The author reports no clear ByteShape versus Unsloth winner, q8_0 as close to a free lunch, q4_0 as weaker, and long context as a major tool-calling degradation factor.
Say hi to Siri AI: Apple announces more conversational voice assistant★ 76
Ars Technica AI50 days agoRelease
Apple announced “Siri AI,” a more conversational version of its voice assistant planned for this fall. The update is tied to a two-tier AI model overhaul powered in part by Google technology. The move signals Apple’s attempt to close the gap with modern AI assistants while preserving its system-level integration and privacy-focused positioning.
Was BitNet a dead end? What happened to ternary LLMs?
r/LocalLLaMA top day50 days agoCommentary
A r/LocalLLaMA user questions whether BitNet and ternary LLMs were a dead end after earlier promise around efficient low-bit models. The post notes that the largest ternary model appears to remain around 2B parameters. It asks why frontier open-weight AI labs are not visibly pursuing the approach, but provides no technical evidence or definitive answer.
Apple Reveals New AI Architecture Built Around Google Gemini Models★ 78
Hacker News (AI keywords)50 days agoRelease
Apple announced a major Apple Intelligence overhaul built around Apple Foundation Models co-developed with Google using technologies behind Gemini. The architecture supports on-device and Private Cloud Compute execution, with stronger reasoning, understanding, and multimodal capabilities. A new system orchestrator coordinates AI features across Apple platforms, though Apple has not yet specified which devices receive the higher-power model.

← PreviousPage 26Next →

Latest in AI

Packed twin inference doubles Qwen3.6-27B throughput on one MI50

JetBrains Mellum 2: a really good and performant model

Mercor’s Brendan Foody calls out Sequoia over dual-pricing valuation tricks

Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72

A llama.cpp CLI Command Builder

Domain Search is now available through the Vercel CLI

Migrating Your GitHub CI to Hugging Face Jobs

Vercel Connect: Secure Access to External Services for Your Agents

How Code and Theory Cut Time-to-Prototype 75% with v0

How Fern Runs Multi-Tenant Docs for Webflow and ElevenLabs on Vercel

Budgets for API Keys on Vercel AI Gateway

Pipeline parallelism in llama.cpp may be wasting your VRAM

Siri AI at WWDC 2026★ 72

Tools for Humanity reportedly plans layoffs amid revenue struggles

Apple’s WWDC AI demos looked more real after $250M false ad settlement

Apple is using AI to fix Safari’s extension problem

Show HN: Command Center, the AI coding env for people who care about quality

Quick note on recent QAT issues

OpenAI files for IPO, following Anthropic★ 74

Following Anthropic, OpenAI files confidentially for IPO★ 78

Apple plays catch-up at WWDC

Developer Runs Half-Life at 30 FPS on a 2007 Nokia N95

Apple bets cheaper AI will woo small developers

llama.cpp PR adds MTP support for Gemma-4 E2B and E4B assistants

Introducing FrontierCode★ 78

Arguing with an AI bot posting outdated Llama 3.1 takes

Qwen3.6-35B-A3B Tool Calling Benchmark: ByteShape vs Unsloth GGUFs

Say hi to Siri AI: Apple announces more conversational voice assistant★ 76

Was BitNet a dead end? What happened to ternary LLMs?

Apple Reveals New AI Architecture Built Around Google Gemini Models★ 78