Latest in AI

Showing:tool-useClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Is It Agentic Enough? Benchmarking Open Models on Your Own Tooling
Hugging Face Blog40 days agoBenchmark
Hugging Face published a guide examining whether open-weight models are sufficiently capable for agentic workflows when tested against custom tooling rather than standardized benchmarks. The piece challenges practitioners to move beyond generic leaderboard scores and assess agent performance in the context of their own use cases. It positions open models as viable candidates for production agentic pipelines, provided evaluation is grounded in realistic tool-use scenarios.
Releasing Cohere North Mini Code
r/LocalLLaMA top day48 days agoRelease
Cohere’s Jay Alammar announced the official release of North Mini Code after early community feedback from r/LocalLLaMA. Weights are available on Hugging Face, including an fp8 version, and the model can be tried for free through OpenCode. For vLLM deployment, Cohere recommends using vLLM main for now and installing cohere_melody for accurate response parsing, while noting community requests for quantization and llama.cpp support.
Cohere Introduces Command A+: Next-Gen Enterprise Model Optimized for Agentic Workflows★ 80
Cohere Blog50 days agoRelease
Cohere has introduced Command A+, its latest enterprise-grade model tailored for agentic workflows. Stepping beyond traditional RAG, Command A+ excels in multi-step reasoning, complex tool use, and multilingual capabilities. It is designed to seamlessly integrate with enterprise APIs, enabling highly autonomous and reliable AI agents.
Arithmetic Without Numbers: How LLMs Do Math
Hacker News (AI keywords)53 days agoCommentary
The article asks whether LLM arithmetic is memorization, heuristics, real computation, or experimental assistance. It summarizes Rune experiments that decode operations and operands from frozen Llama activations, then route them to Python under a no-parser rule. The strongest supported claim is narrow: activation-derived tool arguments worked in scoped audits, while residual-state JIT replacement, long-number generation, and cross-model transfer remain brittle.
EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios
Hugging Face Blog53 days agoBenchmark
ServiceNow AI published a Hugging Face Blog post titled “EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios.” Based only on the title, it appears to be a benchmark dataset update involving tool-use or scenario-based AI evaluation. The exact domains, tools, scenario design, licensing, supported models, and evaluation methodology cannot be confirmed without the full article.
Adding MCP Tools to Reachy Mini
Hugging Face Blog55 days agoTutorial
Based on the available title, this Hugging Face Blog post appears to cover adding MCP tools to Reachy Mini. The likely focus is connecting the open-source desktop robot with Model Context Protocol-based tool integrations. Since the original article text is not provided, implementation details, supported servers, models, and limitations cannot be confirmed.
Introducing Claude Opus 4.8★ 78
Hacker News (AI keywords)60 days agoRelease
Anthropic introduced Claude Opus 4.8 as an upgrade over Opus 4.7, emphasizing benchmark gains, sharper judgment, and more reliable agentic work. The launch also adds dynamic workflows in Claude Code, effort controls in claude.ai and Cowork, and Messages API support for system entries inside messages. Standard pricing remains unchanged, while fast mode is faster and substantially cheaper than before.
Ecom-RLVE：為電商對話 Agent 打造的自適應可驗證強化學習環境★ 75
Hugging Face Blog103 days agoRelease
As large language models (LLMs) become increasingly widespread, more and more companies are attempting to deploy AI agents in e-commerce customer service and…
深入解析 VAKRA：IBM Research 評估 AI Agent 推理、工具調用與失敗模式的全新基準測試★ 75
Hugging Face Blog103 days agoRelease
As generative AI technology has evolved, the industry's focus has shifted from pure "Large Language Models (LLMs)" to "AI Agents" capable of autonomously…
OpenEnv 實戰：在真實世界環境中評估具備工具使用能力的 AI Agent★ 75
Hugging Face Blog166 days agoNew Tool
As AI Agent (intelligent agent) technology advances rapidly, evaluating how these agents perform in the real world has become one of the greatest challenges…
Hugging Face 統一工具調用（Tool Use）標準：簡化開源 LLM Agent 開發流程★ 85
Hugging Face Blog715 days agoRelease
### Background and Pain Points In AI agent development, "tool use" (also known as function calling) is the core capability that allows large language models…
NuminaMath 如何贏得首屆 AIMO 進步獎（AI 數學奧林匹亞）並宣佈完整開源★ 80
Hugging Face Blog747 days agoRelease
### Background and Achievement The AI Mathematical Olympiad (AIMO) Progress Prize aims to advance AI models capable of solving Olympiad-level mathematical…