Latest in AI

Showing:multi-gpuDevelopersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

club-3090 Adds Experimental FP8 Support for Qwen3.6-27B
r/LocalLLaMA top day50 days agoNew Tool
The open-source project club-3090 has rolled out experimental FP8 quantization support for Qwen3.6-27B. This update is highly anticipated by dual RTX 3090 users, allowing them to run the model with significantly reduced VRAM requirements. According to reports, the official Qwen3.6-27B-FP8 model performs virtually identically to the original unquantized BF16 version.
llama-server Router Mode: Pinned Model Grabs CUDA Context on All GPUs, Causing OOM
r/LocalLLaMA top day50 days agoCommentary
A Reddit user highlighted a limitation in llama-server's router mode (`--models-preset`): child processes spawn and initialize CUDA contexts on all available GPUs, even when pinned to a single card. When other GPUs are fully utilized by a large model, launching a smaller model fails with a CUDA OOM error because it cannot allocate the context stub on the maxed-out cards. Currently, child processes inherit the base environment, preventing per-model `CUDA_VISIBLE_DEVICES` configuration.
Hugging Face Accelerate ND-Parallel 指南：高效多 GPU 訓練完全解析★ 80
Hugging Face Blog354 days agoTutorial
As the parameter counts of generative AI and large language models (LLMs) push into the tens and hundreds of billions, the memory of a single GPU has long been…
從 PyTorch DDP 到 Accelerate 再到 Trainer：輕鬆掌握分散式訓練★ 75
Hugging Face Blog1,376 days agoTutorial
This classic technical blog post from Hugging Face systematically guides developers in understanding and mastering distributed training techniques within the…
介紹 🤗 Accelerate：輕鬆實現 PyTorch 分佈式與混合精度訓練的輕量級庫★ 78
Hugging Face Blog1,929 days agoRelease
Hugging Face has officially released a new open-source library called `Accelerate` — a lightweight helper library designed for PyTorch that aims to solve the…