Latest in AI

Showing:DevelopersGeminiClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Introducing FrontierCode★ 78
Hacker News (AI keywords)49 days agoBenchmark
Cognition launched FrontierCode, a coding benchmark focused on mergeability rather than only functional correctness. It evaluates correctness, tests, scope discipline, style, and repository-specific quality standards. Built with open-source maintainers and extensive quality control, it shows current frontier models still struggle: Claude Opus 4.8 scores 13.4% on the hardest Diamond subset, ahead of GPT-5.5 and Gemini 3.1 Pro.
Say hi to Siri AI: Apple announces more conversational voice assistant★ 76
Ars Technica AI49 days agoRelease
Apple announced “Siri AI,” a more conversational version of its voice assistant planned for this fall. The update is tied to a two-tier AI model overhaul powered in part by Google technology. The move signals Apple’s attempt to close the gap with modern AI assistants while preserving its system-level integration and privacy-focused positioning.
Apple Reveals New AI Architecture Built Around Google Gemini Models★ 78
Hacker News (AI keywords)49 days agoRelease
Apple announced a major Apple Intelligence overhaul built around Apple Foundation Models co-developed with Google using technologies behind Gemini. The architecture supports on-device and Private Cloud Compute execution, with stronger reasoning, understanding, and multimodal capabilities. A new system orchestrator coordinates AI features across Apple platforms, though Apple has not yet specified which devices receive the higher-power model.
Gemini 3.5 and Antigravity come to Google NotebookLM
Ars Technica AI49 days agoRelease
Google is upgrading NotebookLM with Gemini 3.5 and Antigravity, pushing the product beyond source-based Q&A into more agentic research workflows. The update adds a secure cloud computer for each notebook, enabling code execution, deeper analysis, and richer file outputs. For now, availability is limited to AI Ultra and enterprise customers, with broader rollout planned later.
[3090] Gemma4 QAT + MTP quick TPS numbers
r/LocalLLaMA top day49 days agoBenchmark
A r/LocalLLaMA user shared quick throughput numbers for Gemma4 QAT with MTP speculative decoding on an RTX 3090 24GB setup. They report roughly 1.2-1.8x TPS improvement, with Gemma 4 31B moving from about 40 tok/s to 70-80 tok/s. The author frames this as a rough benchmark, using 11 task categories and noting stochastic variation from temp 1.0.
Gemma 4 Chat Template now has preserve thinking
r/LocalLLaMA top day50 days agoRelease
A r/LocalLLaMA post notes that Gemma 4’s chat template now has “preserve thinking.” The linked discussion points to google/gemma-4-31B-it on Hugging Face, suggesting a template-level change rather than a new model release or benchmark. The original post does not provide detailed usage notes, defaults, compatibility information, or measured effects.
Upgrading agentic coding capabilities with the new Devstral models★ 72
Mistral AI News50 days agoRelease
Mistral AI announced two Devstral updates focused on agentic coding workflows: Devstral Small 1.1 and Devstral Medium. Devstral Small 1.1 remains a 24B Apache 2.0 open model and reaches 53.6% on SWE-Bench Verified. Devstral Medium reaches 61.6%, is available through Mistral’s API, and supports private deployment and custom finetuning for enterprises.
Voxtral★ 78
Mistral AI News50 days agoRelease
Mistral AI introduces Voxtral, a speech understanding model family with 24B and 3B variants under Apache 2.0. The models support long-context transcription, audio Q&A, summarization, multilingual detection, and function calling from voice. Mistral says Voxtral is competitive across transcription and audio understanding benchmarks, with API access starting at $0.001 per minute and local downloads available on Hugging Face.
Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation
量子位 QbitAI50 days agoRegulation
Based on the headline and public reporting, the article covers a rare joint push by Sam Altman, Dario Amodei, Demis Hassabis, and other AI leaders for US biosecurity legislation. They are asking lawmakers to require synthetic DNA and RNA providers to screen customers, orders, and records. The concern is that advanced AI could lower the knowledge barrier for designing dangerous biological agents.
ElevenAPI
ElevenLabs Blog50 days agoNew Tool
ElevenAPI is a developer category on the ElevenLabs blog rather than a single detailed article. It collects updates and tutorials around speech, music, conversational agents, API keys, web components, and integrations. Listed posts mention Lovable, ElevenLabs UI, Music API, Claude 3.7 Sonnet, Gemini 2.0 Flash, DeepSeek R1, Voice Isolator API, timestamped TTS endpoints, and Speech-to-Speech API.
Introducing Claude Opus 4.8★ 82
Anthropic News50 days agoRelease
Anthropic introduced Claude Opus 4.8 as an upgrade over Opus 4.7, with stronger benchmark performance across coding, agentic skills, reasoning, and knowledge work. The release also adds dynamic workflows in Claude Code, effort controls in claude.ai and Cowork, and new Messages API support for system entries inside the messages array. Pricing for regular usage remains unchanged, while fast mode is now cheaper than previous models.
Thoughts on Gemma4 12B vs 26A4B: Which Is Better?
r/LocalLLaMA top day50 days agoOpinion
The post asks the LocalLLaMA community to compare Gemma4 12B and 26A4B, explicitly excluding the 31B model from discussion. The user is mainly interested in creative tasks, writing, and chatting, with coding treated as optional rather than central. No benchmarks or examples are provided, so the post is best read as a model-selection question about subjective quality and practical use.
Google's Official Gemma 4 QAT Q4_0 GGUFs Have Higher Precision Than Unsloth's Q4_K_XL
r/LocalLLaMA top day50 days agoCommentary
An analysis of Gemma 4 QAT GGUF files reveals that Google's official 'Q4_0' releases actually employ a mixed-precision strategy. For smaller models like E2B and E4B, Google keeps critical token embeddings in Q6_K and certain projection weights in F16. This makes Google's Q4_0 files larger and more precise than Unsloth's 'Q4_K_XL' versions, which default to standard Q4_0 for almost all tensors.
Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75
r/LocalLLaMA top day50 days agoBenchmark
A Reddit user shared benchmark results showing Google's Gemma 4 31B (FP8) performing on par with Claude Sonnet 4.6 Medium. The custom evaluation harness tested complex tasks including Neo4j Cypher queries, entity extraction, agentic tool calling, Python coding, and multi-vector retrieval synthesis. This highlights how quantized mid-sized open-source models are closing the gap with leading proprietary frontier models.
User Shares Gemma 4 QAT Experience: Improved Quality and MTP Speedups
r/LocalLLaMA top day50 days agoOpinion
A Reddit user shared their experience with the Gemma 4 31B QAT (Quantization-Aware Training) model. Compared to traditional GGUF quants like Q6_K_L, the QAT version delivers noticeable quality improvements in roleplay and long-context tasks. Additionally, combining the QAT model with Multi-Token Prediction (MTP) yielded massive speedups, boosting generation speeds from ~20 t/s to up to 50 t/s.
MTP and QAT: What is the Relation? Running Gemma 4 31B in llama.cpp
r/LocalLLaMA top day50 days agoCommentary
A popular Reddit thread addresses user confusion over running Gemma 4 31B locally. It distinguishes between MTP (Multi-Token Prediction for inference speedup) and QAT (Quantization-Aware Training for preserving 4-bit quality). It also confirms that llama.cpp's new MTP support requires updated GGUF files and a secondary draft model file for acceleration.
I design with Claude more than Figma now
Hacker News (AI keywords)51 days agoOpinion
Jane Street designer Edwin Morris describes moving from skepticism about LLMs to using Claude as a core design tool. Instead of relying mainly on specs and Figma mockups, he now builds working prototypes directly in the real codebase. The post also explores the collaboration risks: prototypes must remain disposable proposals, not finished features that shut reviewers out of design input.
Here comes new Siri again
The Verge AI52 days agoCommentary
The Verge frames Apple as behind in AI, but argues that lagging may not be entirely bad. At WWDC, Apple appears ready to introduce the new Siri again after earlier Apple Intelligence promises slipped. The key question is whether Apple can turn AI into a reliable, system-level assistant experience rather than another generic chatbot feature set.
Mantine DataTable source repo compromised; owner account suspended★ 74
Hacker News (AI keywords)52 days agoIncident
A GitHub security notice says Mantine DataTable and other repositories received unauthorized commits through the github-actions bot. The npm packages were reported safe; the risk targets developers who recently cloned or pulled the source and open it in VS Code, Cursor, Claude Code, Gemini, or run npm test. A later update links the payload to the Miasma / Shai-Hulud worm family and says a stolen credential is the likely path.
This is your laptop… on AI
The Verge AI52 days agoHardware
The episode frames developer conference season around Big Tech’s conviction that AI will reshape how people use technology. Nvidia CEO Jensen Huang is highlighted for describing a completely new way to use laptops. Based on the provided excerpt, this is more of an industry commentary on AI PCs than a concrete product-spec report.
The token bill comes due: Inside the scramble to manage AI costs★ 78
TechCrunch AI52 days agoBusiness
TechCrunch reports that enterprise AI spending has shifted from rapid adoption to cost control. Even as per-token prices fall, broader AI rollout and agentic coding tools are multiplying consumption, pushing companies over budget. A new Tokenomics Foundation under the Linux Foundation aims to standardize AI token cost tracking, billing metrics, and efficiency language.
Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG★ 72
Google Research Blog53 days agoRelease
Google Research and Google Cloud introduced an agentic RAG framework hosted on Gemini Enterprise Agent Platform. It uses multiple agents to plan, rewrite, route, retrieve, verify sufficient context, iterate, and synthesize answers. Google reports up to 34% factuality accuracy gains over standard RAG, plus 90.1% accuracy in a cross-corpus FramesQA setting with similar latency to single-corpus retrieval.
Reve 2 and Ideogram 4: Layouts in Imagegen
Latent Space54 days agoRelease
Latent Space’s roundup frames image composition as a major barrier now being tackled by layout-aware image models. Reve 2.0 emphasizes precise generation and editing with layouts, while Ideogram 4.0 uses bounding boxes tied to region descriptions. The issue also covers MAI-Thinking-1, Gemma 4 12B, open audio models, agent execution layers, and model-routing cost debates.
I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
Hacker News (AI keywords)54 days agoBenchmark
The author built a vulnerable React Native app with a Python backend and a Firebase access-control flaw. GPT 5.5 solved 7 of 10 runs, while Deepseek and Claude variants solved fewer attempts. Many other models failed due to refusals, API-focused tunnel vision, false positives, or inability to use the exposed Firebase path correctly.
How LLMs Actually Work
Hacker News (AI keywords)54 days agoTutorial
The article explains how modern LLMs convert text into token IDs, embeddings, and position-aware vectors before passing them through stacked transformer blocks. It covers attention, multi-head attention, KV cache, GQA, feed-forward networks, MoE, residual streams, normalization, and decoding. Its goal is educational: helping readers understand the common architecture behind many current model families and read model cards or papers more confidently.
Google's Gemma 4 12B is designed to run on 16GB RAM laptops
Ars Technica AI54 days agoRelease
Google introduced Gemma 4 12B, an open model aimed at running locally on laptops with 16GB of RAM. The model uses a new encoding scheme and token prediction to improve efficiency relative to its size. Its practical importance depends on real-world benchmarks, but it could lower the barrier for private, offline, and local multimodal AI workflows.
Microsoft Build: MAI-Thinking-1 and MAI Family Models★ 78
Latent Space55 days agoRelease
Microsoft used Build to present itself as both an AI platform and a first-party model lab, announcing seven MAI models across reasoning, code, image, transcription, and voice. The standout was MAI-Thinking-1, described as a 35B active MoE with 256K context and clean data lineage. The recap also ties the launches to GitHub Copilot, Windows agent runtime ambitions, Web IQ grounding APIs, Foundry distribution, and MAIA 200 hardware.
Show HN: Paseo - Beautiful open-source coding agent interface
Hacker News (AI keywords)55 days agoNew Tool
Paseo provides one interface for tools such as Claude Code, Codex, Copilot, OpenCode, and Pi. It runs agents through a local daemon on the user's own machine and supports desktop, mobile, web, and CLI clients. Its appeal is multi-agent orchestration and cross-device control, though real adoption depends on workflow fit, security, and reliability.
Gemini Spark is the most impressive and terrifying AI experience I’ve had yet
The Verge AI56 days agoOpinion
Trip planning has become a recurring showcase for AI agents: name a destination, and the system promises to search options and research local activities. The article frames Gemini Spark as the author’s most impressive and unsettling AI experience so far. The provided excerpt does not include enough detail to assess its workflow, accuracy, limitations, or the specific reason for that concern.
Gemini's new AI agent is about as good as Google's demo
The Verge AI56 days agoNew Tool
Google's new 24/7 AI agent, Gemini Spark, can take on tasks for users and continue working on them. After receiving access last week, The Verge's reviewer found that Spark can perform surprisingly well, roughly matching Google's demo. The remaining question is whether that capability justifies the financial cost and potential privacy tradeoffs.

← PreviousPage 2Next →

Latest in AI

Introducing FrontierCode★ 78

Say hi to Siri AI: Apple announces more conversational voice assistant★ 76

Apple Reveals New AI Architecture Built Around Google Gemini Models★ 78

Gemini 3.5 and Antigravity come to Google NotebookLM

[3090] Gemma4 QAT + MTP quick TPS numbers

Gemma 4 Chat Template now has preserve thinking

Upgrading agentic coding capabilities with the new Devstral models★ 72

Voxtral★ 78

Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation

ElevenAPI

Introducing Claude Opus 4.8★ 82

Thoughts on Gemma4 12B vs 26A4B: Which Is Better?

Google's Official Gemma 4 QAT Q4_0 GGUFs Have Higher Precision Than Unsloth's Q4_K_XL

Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75

User Shares Gemma 4 QAT Experience: Improved Quality and MTP Speedups

MTP and QAT: What is the Relation? Running Gemma 4 31B in llama.cpp

I design with Claude more than Figma now

Here comes new Siri again

Mantine DataTable source repo compromised; owner account suspended★ 74

This is your laptop… on AI

The token bill comes due: Inside the scramble to manage AI costs★ 78

Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG★ 72

Reve 2 and Ideogram 4: Layouts in Imagegen

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

How LLMs Actually Work

Google's Gemma 4 12B is designed to run on 16GB RAM laptops

Microsoft Build: MAI-Thinking-1 and MAI Family Models★ 78

Show HN: Paseo - Beautiful open-source coding agent interface

Gemini Spark is the most impressive and terrifying AI experience I’ve had yet

Gemini's new AI agent is about as good as Google's demo