Latest in AI

Showing:ResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

FCC Wants to Kill Burner Phones by Forcing Telecoms to Verify All Customers' IDs
Hacker News (AI keywords)49 days agoRegulation
The FCC is proposing rules that would require telecom carriers to verify the identity of every customer before activating service. This move would eliminate anonymous prepaid 'burner phones,' long used by journalists, domestic abuse survivors, and privacy-conscious individuals. Critics warn the policy could undermine digital privacy and disproportionately harm vulnerable populations, while proponents argue it would curb fraud and criminal activity.
Can LLMs Beat Classical Hyperparameter Optimization Algorithms?
Hacker News (AI keywords)49 days agoBenchmark
This paper investigates whether LLMs can serve as effective hyperparameter optimization (HPO) agents, competing with established classical methods such as Bayesian optimization, TPE, and random search. The study likely employs a systematic evaluation framework where LLMs iteratively suggest hyperparameter configurations based on task descriptions and historical evaluation results. Findings aim to clarify the practical potential and limitations of LLMs in AutoML pipelines.
Build a Basic AI Agent from Scratch: Long Task Planning
Hacker News (AI keywords)49 days agoTutorial
This source appears to be a tutorial about constructing a basic AI agent from scratch. Based only on the title, its focus is likely long-task planning: how an agent breaks a larger objective into steps and works through them over time. No article body was provided, so specific implementation choices, model providers, tools, code examples, or evaluation results cannot be confirmed.
Single-slot half-height PCIe V100 with NVLink appears in China
r/LocalLLaMA top day49 days agoHardware
A r/LocalLLaMA post says a Bilibili creator has shown a single-slot, half-height PCIe V100 with NVLink on a custom PCB. The card is described as 16 cm long, passively cooled by default, capped at 75W, with another version supporting up to 300W. The 16GB model is expected around or below ¥1500, with a 32GB version reportedly planned, but it is not yet available for purchase.
Rick & Morty
r/LocalLLaMA top day49 days agoCommentary
This r/LocalLLaMA top-day post is a short image meme titled “Rick & Morty.” The only accompanying text says, “nobody expected HF there,” suggesting surprise at HF appearing in the image’s context. There are no technical claims, model details, releases, or benchmarks, so its value is mainly as a small signal of community culture around Hugging Face / HF and local LLM discussions.
Google Introduces Gemma 4 12B: A Unified, Encoder-Free Multimodal Model★ 85
Google DeepMind Blog49 days agoRelease
Google DeepMind has unveiled Gemma 4 12B, a next-generation open-weights model featuring a unified, encoder-free multimodal architecture. By eliminating the traditional separate vision encoder (such as ViT), it processes diverse modalities directly within a single Transformer network. This design simplifies training, reduces inference latency, and enhances cross-modal alignment, marking a significant milestone for open-source AI.
PR-CAD: Progressive Refinement for Text-to-CAD Generation with LLMs
Hacker News (AI keywords)49 days agoPaper
This arXiv paper introduces PR-CAD, a framework for controllable and faithful text-to-CAD generation with large language models. It treats CAD creation and editing as one progressive refinement process rather than separate tasks. The authors curate an interaction dataset and report state-of-the-art controllability and faithfulness on public benchmarks.
Google DeepMind Launches Initiative to Power the Future of Robotics in Europe★ 70
Google DeepMind Blog49 days agoBusiness
Google DeepMind has unveiled a strategic initiative to power the future of robotics in Europe. The program focuses on advancing Embodied AI and physical AI through deep collaborations with European academic institutions and industry partners. By combining DeepMind's AI expertise with Europe's strong engineering foundation, the initiative aims to accelerate breakthroughs in robotic generalization and safety.
PSA: Throttle GPU Power Limits for Major Energy Savings with Minimal Inference Performance Loss
r/LocalLLaMA top day49 days agoHardware
A Reddit user reminds the local LLM community that throttling GPU power limits offers outsized energy savings with minimal performance cost. On dual Radeon VII cards, cutting power from 250W to 100W per card resulted in less than 10% drop in inference speed. LLM inference is memory-bound rather than compute-bound, making it uniquely tolerant of reduced GPU clock speeds compared to training or rendering tasks.
Apple Announced a New On-Device Inference Engine for Apple Silicon
r/LocalLLaMA top day49 days agoRelease
Apple announced CoreAI at WWDC, which the post frames as a possible future replacement for CoreML and an alternative to MLX, llama.cpp, and torch for optimized on-device inference. Models still need conversion through Python scripts, and current supported models appear mostly from mid-2025. No performance data is available yet; the author expects it may trail MLX on GPU, but Apple’s 20B on-device foundation model claim suggests larger app-bundled models could become possible.
Is Grep All You Need? How Agent Harnesses Reshape Agentic Search
Hacker News (AI keywords)49 days agoPaper
Echoing the famous Transformer paper, this work asks whether grep alone is sufficient for agentic search scenarios. The study focuses on 'agent harnesses'—the scaffolding wrapping an LLM, including prompting strategy, tool access, and memory—as the primary driver of search quality. Findings suggest harness design may matter more than the underlying model, challenging the community's focus on model scaling.
Rust-native CPU-only LFM2.5-8B-A1B inference library "bebelm" published as cargo crate
r/LocalLLaMA top day49 days agoNew Tool
Community developer maximecb has published bebelm, a Rust-native, GPU-free inference implementation of Liquid AI's LFM2.5-8B-A1B model, available on crates.io. Decode speed reaches ~37 tokens/s on a Ryzen 7950x with ~7GB memory footprint; prefill is unoptimized and currently similar in speed to decode. The library supports tool-use callbacks, weight sharing across multiple Agent instances with independent KV caches, and Agent cloning to skip repeated prefill on shared prompts.
Jetson Orin NX Build for Hermes Agent + Benchmarking
r/LocalLLaMA top day49 days agoHardware
The post describes turning an unused Jetson Orin NX into a compact local LLM server for Hermes Agent testing. The goals were low noise, over 10 tok/s generation, 300 tok/s prompt processing, at least 65K context, and a custom case. After testing Gemma 4, Qwen 3.6, and many quant variants, the author reports Gemma 4 26B A4B UD Q2_K_XL reaching 66K context and 10.21 tok/s near 60K context.
Five things you need to know about AI
MIT Tech Review AI49 days agoCommentary
The article is based on a talk titled “Five things you need to know about AI,” delivered at SXSW London. The author frames it as a guide to the biggest AI themes right now, drawing partly from MIT Technology Review’s first AI10 list. From the provided excerpt, it reads as a trend-oriented editorial overview rather than a product release, paper, or technical tutorial.
NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain
Hugging Face Blog49 days agoNew Tool
NeuroBait is a Hugging Face community project built to help with ADHD task-initiation freeze rather than diagnosis or to-do planning. It fine-tunes google/gemma-3-12b-it with LoRA to produce short, warm, context-aware nudges. The project uses Unsloth and Modal for training, then deploys on a Hugging Face Space with Gradio, transformers, peft, and a runtime LoRA adapter.
Amap Releases ABot-Earth 0.5: Shifting from 2D Distillation to 3D Native for Consistent Scene Generation★ 70
量子位 QbitAI49 days agoRelease
Amap has released ABot-Earth 0.5, its latest spatial intelligence model. Moving beyond traditional 2D distillation methods (like Score Distillation Sampling), the model adopts a 3D native driving architecture. This breakthrough addresses multi-view inconsistency and distortion, enabling highly consistent 3D scene generation for autonomous driving simulation, smart cities, and digital twin mapping.
ByteDance Open-Sources Bernini, a Unified Framework for AI Video Editing★ 74
量子位 QbitAI49 days agoRelease
ByteDance’s commercial technology team has open-sourced Bernini, a unified framework for AI video generation and editing. Its design separates semantic planning from visual rendering: an MLLM-based planner understands text, source videos, images, and video references, then a DiT-based renderer produces the final video. The released Bernini-R includes inference code and weights, while the full planner-enabled version is still being prepared.
A 4B Edge-Deployable Cognitive Model Built in China
量子位 QbitAI49 days agoRelease
QbitAI’s headline says a domestic Chinese team has built a 4B-parameter “cognitive model” suitable for edge deployment. The framing links it to a model direction previously associated with Andrej Karpathy. Since the article body was not provided, details such as the model name, architecture, benchmark results, hardware requirements, open-source status, and licensing remain unverified.
Is a New Player Joining China’s Top-Tier General AI Models?
量子位 QbitAI49 days agoCommentary
Based only on the title, the article likely examines China’s domestic general-purpose AI model landscape and asks whether a new company or model is entering the top tier. It appears to be an industry observation rather than a technical paper or tutorial. Without the full text, the specific model, company, benchmark evidence, and business context cannot be verified.
Microsoft's open source tools were hacked to steal passwords of AI developers★ 78
Hacker News (AI keywords)49 days agoIncident
Microsoft temporarily removed several open source GitHub projects while investigating suspected malicious content. The affected repos were linked to Azure and developer workflows involving AI coding tools such as Claude Code, Gemini CLI, and VS Code. Security researchers said the malware could steal passwords and sensitive credentials when compromised tools were opened, though Microsoft has not disclosed how many users were affected.
FrontierCode: Benchmarking for Code Quality over Slop
Latent Space49 days agoBenchmark
Latent Space briefly announced FrontierCode with the line “We made a thing!” From the title, FrontierCode appears to be a benchmark for frontier coding systems that prioritizes code quality rather than sheer code generation volume. The provided excerpt does not include methodology, model results, datasets, or tooling details, so conclusions should remain cautious.
L'Affaire Siloxane
Hacker News (AI keywords)49 days agoCommentary
Pinboard founder and prominent tech critic Maciej Cegłowski published a piece titled in the style of historical French scandals, suggesting a serious controversy worth scrutiny. The word 'Siloxane' — a silicon-oxygen chemical compound and basis of silicone — likely serves as a metaphor or pseudonym for a tech or AI entity. Original article content was unavailable; details must be confirmed by reading the source directly.
Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants?
r/LocalLLaMA top day49 days agoBenchmark
A r/LocalLLaMA user is looking for benchmarks comparing Gemma 4 4-bit QAT models, via Unsloth, against standard 8-bit non-QAT quantized models. They understand QAT is expected to preserve much of the BF16 baseline accuracy, but want hard numbers against traditional 8-bit PTQ. The post highlights scattered feedback but no clear head-to-head evaluation yet.
ggml-webgpu improves prefill speeds for k-quants in llama.cpp PR
r/LocalLLaMA top day49 days agoBenchmark
llama.cpp PR #24225 improves ggml-webgpu matrix multiplication performance for k-quants and refactors matmul paths for Q4/Q5/Q8 and k-quants. In pp512 tests on an M2 Pro, reported speedups range from about 1.33x to 3.78x across Q2_K, Q3_K, Q4_K, Q5_K, and Q6_K. The largest gains appear on Q3_K models, including Qwen and Gemma examples.
Packed twin inference doubles Qwen3.6-27B throughput on one MI50
r/LocalLLaMA top day49 days agoBenchmark
A LocalLLaMA user shared an early packed-twin-inference experiment for local LLM acceleration. The idea resembles speculative decoding, but uses the same quantized model side-by-side instead of a smaller draft model. On a single AMD MI50, the author reports Qwen3.6-27B improving from 19.4 to 38.1 tk/s, with Q8-or-lower quantization as the main target.
JetBrains Mellum 2: a really good and performant model
r/LocalLLaMA top day49 days agoBenchmark
A r/LocalLLaMA user shared informal impressions of JetBrains Mellum 2, focusing on local coding-style tasks and tool calls. On an AMD Radeon RX 7900 XT with llama.cpp Vulkan and 131K context, the model reportedly generated around 111 tokens/s and stayed above 100 tokens/s near full context. The author stresses this is not a scientific benchmark, but a practical workflow-oriented test.
Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72
r/LocalLLaMA top day49 days agoRelease
Omi Health’s founder says he fine-tuned NVIDIA Parakeet TDT 0.6B v2 for clinical speech and released Omi Med STT v1 under CC-BY-4.0. The runtime supports Mac, Windows, and Linux, auto-selecting MLX, NeMo, or GGUF/parakeet.cpp backends. In the author’s held-out medical benchmark, it reports 2.37% medical-WER and 145× realtime on local A10 compute.
A llama.cpp CLI Command Builder
r/LocalLLaMA top day50 days agoNew Tool
A r/LocalLLaMA post introduces a llama.cpp CLI Command Builder with no accounts, email, pop-ups, cookies, or ads. It stores information locally in the browser and includes editable fields for flags and arguments found in the documentation. Users can build CLI or server commands, log run information, and compare which configurations work best for their hardware; only Linux is currently supported.
Pipeline parallelism in llama.cpp may be wasting your VRAM
r/LocalLLaMA top day50 days agoBenchmark
The author compared three llama.cpp Vulkan builds: default 4 sched copies, 1 sched copy, and no pipeline parallelism. In their Qwen GGUF test, input and output throughput were nearly identical across all configurations. However, the default setting used about 1.5GB more VRAM for compute buffers and reduced usable context from roughly 113K tokens to around 88K, though parallel-request benefits were not tested.
Quick note on recent QAT issues
r/LocalLLaMA top day50 days agoCommentary
The post argues that recent Google QAT quantization has several implementation problems, including token embeddings being quantized to q6k instead of using a pure mode. It also claims llama-quantize has a hardcoded parameter that mismatches some optimized groups, and that 32-block groups are misaligned. The author recommends Unsloth UD Q4_K_XL as a temporary option and says they are working on a patch.

← PreviousPage 13Next →

Latest in AI

FCC Wants to Kill Burner Phones by Forcing Telecoms to Verify All Customers' IDs

Can LLMs Beat Classical Hyperparameter Optimization Algorithms?

Build a Basic AI Agent from Scratch: Long Task Planning

Single-slot half-height PCIe V100 with NVLink appears in China

Rick & Morty

Google Introduces Gemma 4 12B: A Unified, Encoder-Free Multimodal Model★ 85

PR-CAD: Progressive Refinement for Text-to-CAD Generation with LLMs

Google DeepMind Launches Initiative to Power the Future of Robotics in Europe★ 70

PSA: Throttle GPU Power Limits for Major Energy Savings with Minimal Inference Performance Loss

Apple Announced a New On-Device Inference Engine for Apple Silicon

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

Rust-native CPU-only LFM2.5-8B-A1B inference library "bebelm" published as cargo crate

Jetson Orin NX Build for Hermes Agent + Benchmarking

Five things you need to know about AI

NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain

Amap Releases ABot-Earth 0.5: Shifting from 2D Distillation to 3D Native for Consistent Scene Generation★ 70

ByteDance Open-Sources Bernini, a Unified Framework for AI Video Editing★ 74

A 4B Edge-Deployable Cognitive Model Built in China

Is a New Player Joining China’s Top-Tier General AI Models?

Microsoft's open source tools were hacked to steal passwords of AI developers★ 78

FrontierCode: Benchmarking for Code Quality over Slop

L'Affaire Siloxane

Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants?

ggml-webgpu improves prefill speeds for k-quants in llama.cpp PR

Packed twin inference doubles Qwen3.6-27B throughput on one MI50

JetBrains Mellum 2: a really good and performant model

Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72

A llama.cpp CLI Command Builder

Pipeline parallelism in llama.cpp may be wasting your VRAM

Quick note on recent QAT issues