Latest in AI

Showing:gpuDevelopersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

DiffusionGemma: 4x faster text generation★ 74
Google DeepMind Blog47 days agoRelease
Google’s DiffusionGemma is an Apache 2.0 experimental open model using text diffusion instead of standard autoregressive decoding. The 26B MoE model activates 3.8B parameters during inference and is designed for low-latency local workflows. Google claims up to 4x faster generation on dedicated GPUs, while noting that output quality is below standard Gemma 4 and production-quality use cases should still prefer Gemma 4.
Intel Arc Pro B70 GPU Debuts at MPTS2026 for AI Creative Workflows
量子位 QbitAI48 days agoHardware
Intel presented the Arc Pro B70 GPU at MPTS2026 as a professional GPU for AI-assisted media creation and teaching labs. The article highlights 32GB GDDR6 memory, second-gen Xe² architecture, 32 Xe cores, XMX acceleration, and up to 367 TOPS INT8 performance. Lenovo ThinkStation workstations and GUNNIR’s Arc Pro B70 TF 32G are positioned as ecosystem solutions for local AIGC, rendering, virtual production, and data-sensitive education deployments.
Single-slot half-height PCIe V100 with NVLink appears in China
r/LocalLLaMA top day48 days agoHardware
A r/LocalLLaMA post says a Bilibili creator has shown a single-slot, half-height PCIe V100 with NVLink on a custom PCB. The card is described as 16 cm long, passively cooled by default, capped at 75W, with another version supporting up to 300W. The 16GB model is expected around or below ¥1500, with a 32GB version reportedly planned, but it is not yet available for purchase.
PSA: Throttle GPU Power Limits for Major Energy Savings with Minimal Inference Performance Loss
r/LocalLLaMA top day48 days agoHardware
A Reddit user reminds the local LLM community that throttling GPU power limits offers outsized energy savings with minimal performance cost. On dual Radeon VII cards, cutting power from 250W to 100W per card resulted in less than 10% drop in inference speed. LLM inference is memory-bound rather than compute-bound, making it uniquely tolerant of reduced GPU clock speeds compared to training or rendering tasks.
Xiaomi Claims 1,000+ TPS on a 1T Model Using a Standard 8-GPU Server★ 72
r/LocalLLaMA top day49 days agoBenchmark
Xiaomi announced MiMo-V2.5-Pro-UltraSpeed with TileRT, claiming over 1,000 tokens/s decode speed on a 1-trillion-parameter MoE model. The company says it runs on a single standard 8-GPU commodity node, not wafer-scale or SRAM-heavy specialized hardware. The claimed stack combines FP4 MoE expert quantization, DFlash speculative decoding, and TileRT low-latency inference kernels, but independent validation is still needed.
When GPUs Turn from Cost Burden into Profit Engine, Enterprise AI Enters a New Game
INSIDE 硬塞 AI50 days agoBusiness
INFINITIX addresses low GPU utilization with software designed for enterprise AI infrastructure. Its AI-Stack uses virtualization and scheduling to maximize GPU efficiency and reduce idle compute. The ixCSP platform helps service providers turn compute capacity into operational cloud services, reframing GPUs from a cost burden into a potential revenue-generating asset.
AI 進入推論時代！AMD 蘇姿丰看好 CPU 市場年增 35%，架構將趨向「1:1」★ 75
INSIDE 硬塞 AI67 days agoOpinion
AMD CEO Lisa Su recently shared her latest views on the AI hardware market, pointing out that the AI industry is approaching a critical inflection point…
免費訓練 AI 模型！Hugging Face 聯手 Unsloth 推出 Hugging Face Jobs 免費微調服務★ 85
Hugging Face Blog158 days agoNew Tool
Hugging Face's official blog has announced exciting news for the open-source AI community: Hugging Face has formed a deep partnership with Unsloth — the…
我們讓 Claude 撰寫 CUDA 核心並教導開源模型！Hugging Face 發表 Upskill 專案★ 80
Hugging Face Blog181 days agoRelease
### Background and Challenge: Why Is CUDA Programming So Hard for AI? CUDA (Compute Unified Device Architecture) is a parallel computing platform and…
使用 Hugging Face 輕鬆構建與分享 ROCm 核心 (Kernels)★ 70
Hugging Face Blog253 days agoRelease
Hugging Face recently announced a major update for AMD GPU users and developers, aimed at simplifying the process of building, packaging, and sharing ROCm…
探討全球算力格局的轉變：Hugging Face 剖析 AI 基礎設施的未來★ 75
Hugging Face Blog271 days agoOpinion
Against the backdrop of explosive global growth in artificial intelligence, compute has become the core resource that determines technological competitiveness…
Scaleway 正式加入 Hugging Face 推理提供商（Inference Providers）🔥★ 70
Hugging Face Blog312 days agoRelease
Hugging Face has announced a deep partnership with Scaleway, a leading European cloud infrastructure provider, with Scaleway officially joining the Hugging…
從零到 GPU：構建與擴展生產級 CUDA Kernel 實戰指南★ 80
Hugging Face Blog344 days agoTutorial
As the architecture and scale of deep learning models (such as large language models, or LLMs) continue to expand, standard PyTorch operators sometimes fall…
為 AMD MI300 建立自訂 Kernel：利用 Triton 釋放 AMD GPU 的極致效能★ 72
Hugging Face Blog384 days agoTutorial
As AMD Instinct MI300 series GPUs (such as the MI300X) gradually increase their market share in the AI compute market, how to perform low-level optimization…
Featherless AI 正式加入 Hugging Face 推理供應商（Inference Providers）★ 75
Hugging Face Blog411 days agoRelease
Hugging Face officially announced a partnership with Featherless AI, a serverless GPU inference platform, integrating it into the Hugging Face Inference…
Hugging Face 聯手 NVIDIA 推出全新「訓練集群即服務」(Training Cluster as a Service)★ 85
Hugging Face Blog412 days agoRelease
Hugging Face has announced a new partnership with AI chip giant NVIDIA, launching "Training Cluster as a Service" (TCaaS). The introduction of this service…
Replicate 正式支援 NVIDIA H100 GPU：效能更強、成本更低
Replicate Blog438 days agoRelease
Replicate, the well-known AI model cloud hosting platform, has announced that it is officially introducing and supporting NVIDIA H100 GPUs within its…
AINews：今天沒什麼大事，推薦關注 SF Compute 與 GPU 新興雲端服務討論
TLDR AI (Buttondown)472 days agoCommentary
After a week that was expected to potentially be turbulent but turned out to be quite calm, the latest issue of AINews briefly declares that "nothing major…
Hugging Face 推出三家全新無伺服器推論服務商：Hyperbolic、Nebius AI Studio 與 Novita AI★ 75
Hugging Face Blog525 days agoRelease
On February 18, 2025, Hugging Face announced the addition of three new partners to its serverless inference ecosystem: Hyperbolic, Nebius AI Studio, and Novita…
Replicate 正式支援 NVIDIA L40S GPU：性能更佳、成本更低
Replicate Blog620 days agoNew Tool
The AI deployment platform Replicate has announced the official availability of NVIDIA L40S GPU compute on its platform. This update aims to provide developers…
Hugging Face 聯手 NVIDIA NIM 推出無伺服器推論服務 (Serverless Inference)★ 82
Hugging Face Blog729 days agoRelease
Hugging Face and NVIDIA announced a major partnership in late July 2024, officially launching a serverless inference service powered by NVIDIA NIM (NVIDIA…
Replicate Intelligence #4：探索 GPT 模型中的概念、瀏覽器即時語音轉文字與 H100 GPU 即將上線
Replicate Blog774 days agoRelease
Replicate has published its technical newsletter, Replicate Intelligence #4, summarizing recent major developments in the AI field as well as the latest…
NVIDIA H100 GPU 即將登陸 Replicate：支援更快速的模型推理與訓練
Replicate Blog776 days agoRelease
The official blog of Replicate, the popular AI model hosting and deployment platform, has announced that NVIDIA H100 Tensor Core GPUs will soon be officially…
在 NVIDIA DGX Cloud 上輕鬆使用 H100 GPU 訓練 Hugging Face 模型★ 75
Hugging Face Blog862 days agoRelease
Hugging Face has announced a deep partnership with NVIDIA to directly integrate NVIDIA DGX Cloud services into the Hugging Face platform. This collaboration…
Optimum-NVIDIA：只需一行程式碼，即可解鎖極速 LLM 推理★ 80
Hugging Face Blog966 days agoRelease
Hugging Face announced the launch of a new open-source library called "Optimum-NVIDIA," the result of a deep collaboration with NVIDIA, aimed at seamlessly…
AMD 攜手 Hugging Face：推出 optimum-amd 實現 AMD GPU 的大語言模型即開即用加速★ 75
Hugging Face Blog966 days agoRelease
Hugging Face's official blog announced a deep partnership with chip giant AMD, launching `optimum-amd`, an open-source library optimized specifically for AMD…
使用 AutoGPTQ 與 transformers 讓大型語言模型更輕量化★ 85
Hugging Face Blog1,070 days agoRelease
This Hugging Face official blog post introduces a major update that integrates AutoGPTQ into the `transformers` and `optimum` libraries. GPTQ (Generalized…
Hugging Face 與 AMD 攜手合作，加速 CPU 與 GPU 平台上的先進 AI 模型運算★ 70
Hugging Face Blog1,141 days agoBusiness
In June 2023, Hugging Face officially announced a long-term strategic partnership with chip giant AMD. The core objective of this collaboration is to optimize…
Hugging Face 推出全新定價方案：PRO 訂閱與按需計費機制
Hugging Face Blog1,358 days agoRelease
Hugging Face officially announced a new platform pricing structure, designed to provide more flexible and affordable options for community members…

Latest in AI

DiffusionGemma: 4x faster text generation★ 74

Intel Arc Pro B70 GPU Debuts at MPTS2026 for AI Creative Workflows

Single-slot half-height PCIe V100 with NVLink appears in China

PSA: Throttle GPU Power Limits for Major Energy Savings with Minimal Inference Performance Loss

Xiaomi Claims 1,000+ TPS on a 1T Model Using a Standard 8-GPU Server★ 72

When GPUs Turn from Cost Burden into Profit Engine, Enterprise AI Enters a New Game

AI 進入推論時代！AMD 蘇姿丰看好 CPU 市場年增 35%，架構將趨向「1:1」★ 75

免費訓練 AI 模型！Hugging Face 聯手 Unsloth 推出 Hugging Face Jobs 免費微調服務★ 85

我們讓 Claude 撰寫 CUDA 核心並教導開源模型！Hugging Face 發表 Upskill 專案★ 80

使用 Hugging Face 輕鬆構建與分享 ROCm 核心 (Kernels)★ 70

探討全球算力格局的轉變：Hugging Face 剖析 AI 基礎設施的未來★ 75

Scaleway 正式加入 Hugging Face 推理提供商（Inference Providers）🔥★ 70

從零到 GPU：構建與擴展生產級 CUDA Kernel 實戰指南★ 80

為 AMD MI300 建立自訂 Kernel：利用 Triton 釋放 AMD GPU 的極致效能★ 72

Featherless AI 正式加入 Hugging Face 推理供應商（Inference Providers）★ 75

Hugging Face 聯手 NVIDIA 推出全新「訓練集群即服務」(Training Cluster as a Service)★ 85

Replicate 正式支援 NVIDIA H100 GPU：效能更強、成本更低

AINews：今天沒什麼大事，推薦關注 SF Compute 與 GPU 新興雲端服務討論

Hugging Face 推出三家全新無伺服器推論服務商：Hyperbolic、Nebius AI Studio 與 Novita AI★ 75

Replicate 正式支援 NVIDIA L40S GPU：性能更佳、成本更低

Hugging Face 聯手 NVIDIA NIM 推出無伺服器推論服務 (Serverless Inference)★ 82

Replicate Intelligence #4：探索 GPT 模型中的概念、瀏覽器即時語音轉文字與 H100 GPU 即將上線

NVIDIA H100 GPU 即將登陸 Replicate：支援更快速的模型推理與訓練

在 NVIDIA DGX Cloud 上輕鬆使用 H100 GPU 訓練 Hugging Face 模型★ 75

Optimum-NVIDIA：只需一行程式碼，即可解鎖極速 LLM 推理★ 80

AMD 攜手 Hugging Face：推出 optimum-amd 實現 AMD GPU 的大語言模型即開即用加速★ 75

使用 AutoGPTQ 與 transformers 讓大型語言模型更輕量化★ 85

Hugging Face 與 AMD 攜手合作，加速 CPU 與 GPU 平台上的先進 AI 模型運算★ 70

Hugging Face 推出全新定價方案：PRO 訂閱與按需計費機制