Latest in AI

Showing:benchmarksGeneralClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

LocalLLaMA post tier list
r/LocalLLaMA top day49 days agoOpinion
The author proposes a tier list for r/LocalLLaMA posts in response to complaints about declining post quality. Top-tier posts include new local model releases with GGUF/MLX or benchmark data, meaningful optimizations, complete hardware performance reports, and well-analyzed research. Low-tier posts include repeated toy benchmarks, unrelated cloud AI chatter, AI-generated slop, and thinly disguised ads for Claude-wrapper startups.
解讀當前開源與閉源 AI 模型的性能差距：超越單一評估指標的迷思★ 75
Interconnects (Nathan L.)98 days agoOpinion
In today's AI landscape, the performance gap between open-weights models (such as Meta's Llama family) and closed-source models (such as OpenAI's GPT and…
Gemma 4 與開源模型成功的關鍵：為什麼基準測試分數不再是唯一指標★ 75
Interconnects (Nathan L.)115 days agoCommentary
This article takes a deep dive into the release of Google's latest open-source model Gemma 4, using it as an opportunity to re-examine the core factors that…
Google DeepMind 推出評估 AGI 進程的「認知框架」，並同步舉辦 Kaggle 黑客松打造全新評估標準★ 85
Google DeepMind Blog132 days agoRelease
As large language models (LLMs) advance rapidly, traditional AI evaluation benchmarks (such as MMLU, GSM8K, and others) are quickly facing the twin challenges…
Import AI 446：核能 LLM、中國大型 AI 基準測試、AI 評估與政策★ 75
Import AI (Jack Clark)155 days agoCommentary
In this edition of Import AI 446, author Jack Clark explores three highly forward-looking and interconnected topics in current AI development: Nuclear LLMs…
Import AI 445：超級智能的時機點、AI 破解前沿數學證明與全新機器學習研究基準★ 75
Import AI (Jack Clark)162 days agoCommentary
In this edition of Import AI (Issue 445), author Jack Clark guides readers through three core topics at the very frontier of AI development: the timeline for…
Import AI 444：LLM 社會學、華為用 AI 寫作業系統核心、晶片設計基準測試 ChipBench★ 75
Import AI (Jack Clark)169 days agoCommentary
This edition of Import AI (Issue 444), written by Jack Clark, delves into the latest breakthroughs in artificial intelligence across three domains: social…
Opus 4.6、Codex 5.3 與後基準測試時代：2026 年我們該如何評估 AI 模型？★ 80
Interconnects (Nathan L.)169 days agoOpinion
In 2026, with the release of next-generation models such as Anthropic's Opus 4.6 and OpenAI's Codex 5.3, the AI community faces a fundamental challenge…
Hugging Face 推出 Community Evals：別再盲信黑箱排行榜，讓社群來決定模型好壞！★ 85
Hugging Face Blog174 days agoRelease
In today's era of rapid AI advancement, major model vendors and research institutions are releasing all manner of "leaderboards" to claim their models surpass…
Google 發表 Gemini 2.5 Flash：主宰性價比邊界，首創可精確控制的「思考預算」★ 85
TLDR AI (Buttondown)466 days agoRelease
Google has officially released its new model Gemini 2.5 Flash, marking Google's comprehensive dominance over the cost-efficiency Pareto frontier on LMArena…
OpenAI 推出全新主力模型 GPT 4.1：效能與實用性的新平衡★ 85
TLDR AI (Buttondown)469 days agoRelease
OpenAI has officially released its new flagship model GPT 4.1, positioned as the next-generation "workhorse" designed to give developers and enterprises the…
Hugging Face 與 Upstage 推出 Open Ko-LLM 排行榜：引領韓國大語言模型評估生態系
Hugging Face Blog889 days agoRelease
Hugging Face and South Korea's leading AI startup Upstage have jointly announced the launch of the "Open Ko-LLM Leaderboard." This is a brand-new evaluation…