Latest in AI

Showing:cudaDevelopersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

llama-server Router Mode: Pinned Model Grabs CUDA Context on All GPUs, Causing OOM
r/LocalLLaMA top day50 days agoCommentary
A Reddit user highlighted a limitation in llama-server's router mode (`--models-preset`): child processes spawn and initialize CUDA contexts on all available GPUs, even when pinned to a single card. When other GPUs are fully utilized by a large model, launching a smaller model fails with a CUDA OOM error because it cannot allocate the context stub on the maxed-out cards. Currently, child processes inherit the base environment, preventing per-model `CUDA_VISIBLE_DEVICES` configuration.
Tiny hackable CUDA language model implementation
Hacker News (AI keywords)52 days agoNew Tool
This GitHub project implements a compact generative pretrained transformer as an autoregressive byte-level sequence model. Its README describes causal self-attention, RoPE, feed-forward layers, AdamW, cross-entropy training, and BLAS/OpenBLAS-backed matrix operations, with CUDA toolkit listed in setup steps. It is most useful as an educational and experimental codebase, not as a production-grade replacement for large commercial LLMs.
Show HN: Tiny-vLLM, a C++ and CUDA LLM Inference Engine
Hacker News (AI keywords)59 days agoNew Tool
Tiny-vLLM is a Show HN project described as a high-performance LLM inference engine implemented in C++ and CUDA. From the provided title alone, the project appears aimed at developers or ML engineers interested in GPU-accelerated local or server-side inference. No further claims about supported models, benchmarks, APIs, licensing, deployment targets, or production readiness are stated in the source.
Import AI 448：AI 研發趨勢、ByteDance 的 CUDA 寫作 Agent、衛星邊緣 AI 與 AI 戰爭的未來★ 75
Import AI (Jack Clark)141 days agoCommentary
This issue of Import AI 448, written by Jack Clark, takes a deep dive into the latest developments in AI R&D, automated hardware optimization, and the…
Hugging Face 釋出新技術：讓 AI Agent 具備自動編寫與優化自訂 CUDA Kernel 的能力★ 80
Hugging Face Blog165 days agoNew Tool
As the demand for computational efficiency in deep learning models continues to grow, writing custom CUDA kernels (GPU core programs) has become a key…
我們讓 Claude 撰寫 CUDA 核心並教導開源模型！Hugging Face 發表 Upskill 專案★ 80
Hugging Face Blog181 days agoRelease
### Background and Challenge: Why Is CUDA Programming So Hard for AI? CUDA (Compute Unified Device Architecture) is a parallel computing platform and…
從零到 GPU：構建與擴展生產級 CUDA Kernel 實戰指南★ 80
Hugging Face Blog344 days agoTutorial
As the architecture and scale of deep learning models (such as large language models, or LLMs) continue to expand, standard PyTorch operators sometimes fall…
5 分鐘快速上手 Hugging Face Kernel Hub：GPU 加速算子託管新起點★ 78
Hugging Face Blog411 days agoRelease
The Hugging Face official blog published a "Get Started with Hugging Face Kernel Hub in 5 Minutes" tutorial, formally introducing this new platform to the…