Latest in AI

Showing:long-contextResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

FlashMemory-DeepSeek-V4: Ultra-Long Context via Lookahead Sparse Attention
r/LocalLLaMA top day47 days agoPaper
FlashMemory-DeepSeek-V4 introduces Lookahead Sparse Attention (LSA), a predictive inference paradigm that retains only query-critical KV chunks in GPU memory instead of the full cache. A Neural Memory Indexer, trained independently using a backbone-free dual-encoder strategy, proactively forecasts which historical tokens will matter next. The system compresses average KV cache footprint by 86.5% and exceeds 90% compression at 500K-token scales, while delivering a slight accuracy gain of +0.6% on long-context benchmarks.
Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74
量子位 QbitAI48 days agoRelease
QbitAI says Anthropic introduced Claude Fable 5 for general users and Claude Mythos 5 for a small set of trusted users. The article highlights software engineering, long-context work, native vision, memory, and scientific research capabilities. It also focuses on a safety-routing design where Fable 5 downgrades high-risk requests to Claude Opus 4.8 instead of simply refusing.
JetBrains Mellum 2: a really good and performant model
r/LocalLLaMA top day49 days agoBenchmark
A r/LocalLLaMA user shared informal impressions of JetBrains Mellum 2, focusing on local coding-style tasks and tool calls. On an AMD Radeon RX 7900 XT with llama.cpp Vulkan and 131K context, the model reportedly generated around 111 tokens/s and stayed above 100 tokens/s near full context. The author stresses this is not a scientific benchmark, but a practical workflow-oriented test.
Qwen3.6-35B-A3B Tool Calling Benchmark: ByteShape vs Unsloth GGUFs
r/LocalLLaMA top day49 days agoBenchmark
The post benchmarks eight Qwen3.6-35B-A3B GGUF quants from ByteShape and Unsloth using llama.cpp and tool-eval-bench. It compares f16, q8_0, and q4_0 KV cache quantization under short and long-context pressure, totaling 144 runs and roughly 300 GPU-hours. The author reports no clear ByteShape versus Unsloth winner, q8_0 as close to a free lunch, q4_0 as weaker, and long context as a major tool-calling degradation factor.
Remote agents in Vibe. Powered by Mistral Medium 3.5.★ 76
Mistral AI News50 days agoRelease
Mistral Medium 3.5 is a 128B dense flagship model with a 256k context window, combining instruction-following, reasoning, and coding. It becomes the default model for Le Chat and Mistral Vibe, enabling cloud-based remote coding agents launched from the CLI or chat. The release also adds Le Chat Work mode for multi-step, cross-tool workflows with visible actions and approval gates for sensitive operations.
Qwen 3.6 27B KV Cache Quantization Benchmarks: KVarN, Turbo, and TCQ Evaluated
r/LocalLLaMA top day51 days agoBenchmark
Reddit user Anbeeld shared comprehensive KV cache quantization benchmarks for Qwen 3.6 27B across 75 configuration pairs. Using BeeLlama.cpp (a custom llama.cpp fork), the test evaluates q8, q6, q5, and q4 quantization levels. It specifically highlights advanced implementations like KVarN, TurboQuant, and TCQ to optimize long-context inference efficiency.
LLM Research Papers: The 2026 List (January to May)
Ahead of AI (Raschka)52 days agoPaper
Sebastian Raschka compiles a curated reference list of LLM papers he bookmarked from January through May 2026. The list is not comprehensive, but organized around topics useful for future articles, lectures, code examples, and research work. Public sections emphasize reasoning, RL, efficient inference, long context, agent systems, tool use, coding agents, diffusion language models, and serving infrastructure.
[AINews] 微調的終結？探討 Fine-tuning 在大模型時代的未來與轉變★ 75
Latent Space76 days agoOpinion
As AI technology continues to iterate at a rapid pace, the developer community is confronting a profound rethinking of the question: "Is fine-tuning heading…
NVIDIA 推出 Nemotron 3 Nano Omni：支援長文本的多模態智慧模型，專為文件、語音與影片 Agent 設計★ 75
Hugging Face Blog90 days agoRelease
NVIDIA has officially launched a new lightweight multimodal model, "Nemotron 3 Nano Omni." This model is designed to deliver powerful multimodal intelligence…
DeepSeek-V4：Agent 真正能派上用場的百萬 Token 超長上下文★ 85
Hugging Face Blog95 days agoRelease
DeepSeek has introduced its next-generation open-source model, DeepSeek-V4, whose most attention-grabbing feature is an ultra-long context window of up to 1…
Ulysses 序列平行化：實現百萬 Token 超長上下文的模型訓練技術解析★ 78
Hugging Face Blog141 days agoTutorial
As large language models (LLMs) push the demand for long context toward the million-token scale, the VRAM of a single GPU can no longer accommodate the…
Microsoft 推出 Differential Transformer V2：大幅提升差分注意力機制效率與長文本效能★ 80
Hugging Face Blog189 days agoRelease
Microsoft's research team has officially published **Differential Transformer V2 (Diff-Transformer V2)** on Hugging Face. **Core Technical Background: What Is…
TimeScope：評估影片大型多模態模型（Video LMM）長影片理解極限的新基準★ 75
Hugging Face Blog370 days agoRelease
As large multimodal models (LMMs) have achieved breakthroughs in image and short-video understanding, the industry has gradually shifted its attention to the…
介紹 HELMET：全面評估長文本語言模型（Long-context LLMs）的新一代基準測試★ 80
Hugging Face Blog468 days agoRelease
### Background and Pain Points: Moving Beyond the Overly Simple "Needle in a Haystack" Test In recent years, the context window length supported by large…
Google 推出全新 Gemma 3：支援多模態、多語言與長文本的開源大語言模型★ 90
Hugging Face Blog503 days agoRelease
Google has officially launched Gemma 3, the next generation of its open-source large language model series — a major technical leap forward from Gemma 2. Gemma…
使用 KVPress 掌握大語言模型（LLM）的長文本處理能力★ 75
Hugging Face Blog551 days agoNew Tool
In the current trajectory of large language model (LLM) development, support for long contexts has become a standard requirement. However, as input text length…
Bamba：高推論效率的混合 Mamba2 開源模型正式發布★ 75
Hugging Face Blog587 days agoRelease
### Background and Architectural Innovation As large language models (LLMs) have advanced rapidly, the traditional Transformer architecture faces severe…
一個失敗的實驗：Infini-Attention，以及為什麼我們應該繼續嘗試？★ 75
Hugging Face Blog713 days agoCommentary
This Hugging Face blog post provides a detailed account of the team's attempt to reproduce and evaluate Google's proposed "Infini-Attention" mechanism — and…
歡迎 Falcon Mamba：首款強大的無注意力機制（Attention-Free）7B 語言模型★ 85
Hugging Face Blog715 days agoRelease
The Technology Innovation Institute (TII) of Abu Dhabi has officially released Falcon Mamba 7B, a significant milestone in the evolution of AI architectures…
解鎖更長的文本生成：深入探討 Key-Value (KV) 快取量化技術★ 80
Hugging Face Blog803 days agoTutorial
During the inference process of large language models (LLMs), the self-attention mechanism needs to store the Key and Value vectors of historical tokens (i.e…
深入理解 BigBird 的區塊稀疏注意力機制 (Block Sparse Attention)
Hugging Face Blog1,945 days agoTutorial
Traditional Transformer models (such as BERT) are constrained by the quadratic complexity $O(N^2)$ of their self-attention mechanism, and are typically limited…
Hugging Face 讀書會：長文本 Transformer 模型技術解析與演進
Hugging Face Blog1,967 days agoCommentary
In the field of natural language processing (NLP), the core of standard Transformer models (such as BERT and GPT-2) is the self-attention mechanism. However…
Reformer：挑戰語言模型長文本處理極限的架構
Hugging Face Blog2,216 days agoPaper
This technical blog post published by Hugging Face takes a deep dive into how the Reformer architecture overcomes the memory and computational bottlenecks that…

Latest in AI

FlashMemory-DeepSeek-V4: Ultra-Long Context via Lookahead Sparse Attention

Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74

JetBrains Mellum 2: a really good and performant model

Qwen3.6-35B-A3B Tool Calling Benchmark: ByteShape vs Unsloth GGUFs

Remote agents in Vibe. Powered by Mistral Medium 3.5.★ 76

Qwen 3.6 27B KV Cache Quantization Benchmarks: KVarN, Turbo, and TCQ Evaluated

LLM Research Papers: The 2026 List (January to May)

[AINews] 微調的終結？探討 Fine-tuning 在大模型時代的未來與轉變★ 75

NVIDIA 推出 Nemotron 3 Nano Omni：支援長文本的多模態智慧模型，專為文件、語音與影片 Agent 設計★ 75

DeepSeek-V4：Agent 真正能派上用場的百萬 Token 超長上下文★ 85

Ulysses 序列平行化：實現百萬 Token 超長上下文的模型訓練技術解析★ 78

Microsoft 推出 Differential Transformer V2：大幅提升差分注意力機制效率與長文本效能★ 80

TimeScope：評估影片大型多模態模型（Video LMM）長影片理解極限的新基準★ 75

介紹 HELMET：全面評估長文本語言模型（Long-context LLMs）的新一代基準測試★ 80

Google 推出全新 Gemma 3：支援多模態、多語言與長文本的開源大語言模型★ 90

使用 KVPress 掌握大語言模型（LLM）的長文本處理能力★ 75

Bamba：高推論效率的混合 Mamba2 開源模型正式發布★ 75

一個失敗的實驗：Infini-Attention，以及為什麼我們應該繼續嘗試？★ 75

歡迎 Falcon Mamba：首款強大的無注意力機制（Attention-Free）7B 語言模型★ 85

解鎖更長的文本生成：深入探討 Key-Value (KV) 快取量化技術★ 80

深入理解 BigBird 的區塊稀疏注意力機制 (Block Sparse Attention)

Hugging Face 讀書會：長文本 Transformer 模型技術解析與演進

Reformer：挑戰語言模型長文本處理極限的架構