Latest in AI

Showing:pytorchClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Profiling in PyTorch Part 2: From nn.Linear to a Fused MLP
Hugging Face Blog47 days agoTutorial
This Hugging Face Blog post appears to be a technical tutorial in a PyTorch profiling series. From the title, it focuses on analyzing performance from basic nn.Linear operations to a fused multilayer perceptron implementation. The likely audience is ML engineers and developers interested in understanding where neural network execution time goes and how kernel fusion can improve model throughput.
Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler
Hugging Face Blog60 days agoTutorial
Based on the title, this Hugging Face Blog post is an introductory PyTorch profiling guide focused on torch.profiler. It likely targets developers and ML engineers who need to identify training or inference bottlenecks through observable performance data. Since the full article text was not provided, implementation details, examples, and specific optimization advice cannot be confirmed.
Safetensors 正式加入 PyTorch 基金會，加速推動安全且高效的模型權重標準★ 75
Hugging Face Blog111 days agoBusiness
Hugging Face has officially announced that its popular open-source model weight storage format, Safetensors, has joined the PyTorch Foundation. This is an…
Transformers v5 正式發布：簡化模型定義，全面賦能 AI 生態系★ 90
Hugging Face Blog239 days agoRelease
The `transformers` library from Hugging Face is a cornerstone of today's AI and open-source community. With the official release of v5, the team has introduced…
Arm 將參展 PyTorch Conference，展示 Arm 架構上的 AI 推論與 KleidiAI 優化技術
Hugging Face Blog290 days agoBusiness
Arm has officially announced on the Hugging Face blog that it will actively participate in the upcoming PyTorch Conference. As the Arm architecture gains…
使用 Torch Compile 快取加速模型啟動與推論速度★ 75
Replicate Blog323 days agoTutorial
When deploying modern AI models (such as LLaMA, Flux, or Stable Diffusion), `torch.compile` — introduced in PyTorch 2.0 — is a powerful performance…
讓你的 ZeroGPU Spaces 速度飛起：利用 PyTorch AOT 提前編譯技術消除冷啟動延遲★ 75
Hugging Face Blog329 days agoTutorial
Hugging Face's ZeroGPU Spaces offers developers a free and efficient way to deploy GPU-accelerated AI applications. However, ZeroGPU uses a dynamic allocation…
從零到 GPU：構建與擴展生產級 CUDA Kernel 實戰指南★ 80
Hugging Face Blog344 days agoTutorial
As the architecture and scale of deep learning models (such as large language models, or LLMs) continue to expand, standard PyTorch operators sometimes fall…
從零開始在 nanoVLM 中實作 KV Cache★ 75
Hugging Face Blog419 days agoTutorial
In the inference process of large language models (LLMs) and vision-language models (VLMs), autoregressive decoding is a major performance bottleneck. Each…
nanoVLM：用純 PyTorch 訓練視覺語言模型（VLM）的最簡開源專案★ 75
Hugging Face Blog433 days agoRelease
Hugging Face recently launched an open-source project called nanoVLM, positioned as "the simplest repository for training Vision Language Models (VLMs) in pure…
在 PyTorch 中視覺化與理解 GPU 記憶體佔用★ 82
Hugging Face Blog581 days agoTutorial
One of the most common pain points developers face in deep learning and large language model (LLM) training is the "Out of Memory (OOM)" error. To help…
Hugging Face 釋出 Accelerate 1.0.0 正式版：邁向分散式訓練與大模型推理的全新里程碑★ 80
Hugging Face Blog683 days agoRelease
Hugging Face has officially released version 1.0.0 of its core open-source library, Accelerate. This is a milestone update, signifying that since the project's…
使用 Quanto 與 Diffusers 打造記憶體高效的 Diffusion Transformers (DiT)★ 80
Hugging Face Blog728 days agoRelease
### Background and Challenges As generative AI technology evolves, image and video generation models are increasingly transitioning from traditional UNet…
Hugging Face 推出 Quanto：適用於 Optimum 的全新 PyTorch 量化後端★ 75
Hugging Face Blog862 days agoRelease
Hugging Face has officially introduced Quanto, a brand-new quantization library designed for PyTorch, which has been integrated as a backend into the Hugging…
Optimum + ONNX Runtime：讓 Hugging Face 模型訓練更簡單、更快速★ 75
Hugging Face Blog1,281 days agoRelease
As the scale of deep learning models (such as Transformers) continues to grow, training these models demands enormous computational resources and time. To help…
使用 Intel Sapphire Rapids 加速 PyTorch Transformers 模型 - 第一部分
Hugging Face Blog1,303 days agoTutorial
This article is the first installment in a collaboration series between Hugging Face and Intel, focusing on how to accelerate PyTorch Transformer models using…
Hugging Face 揭秘：🤗 Accelerate 如何藉助 PyTorch 運行超大型模型★ 80
Hugging Face Blog1,400 days agoTutorial
As the parameter counts of large language models (LLMs) grow exponentially, how to load and run these models on limited hardware has become a major pain point…
在 M1 Mac 的 GPU 上本地運行 Stable Diffusion
Replicate Blog1,427 days agoTutorial
With the open-sourcing of Stable Diffusion, running powerful AI image generation models locally has become a real possibility. This guide published by…
使用 PyTorch 實作策略梯度（Policy Gradient）：Hugging Face 深度強化學習教學
Hugging Face Blog1,489 days agoTutorial
This tutorial comes from Unit 4 of Hugging Face's Deep Reinforcement Learning Course, covering the topic of "Implementing Policy Gradients with PyTorch." In…
使用 DeepSpeed 與 Hugging Face Accelerate 加速超大型模型訓練★ 75
Hugging Face Blog1,491 days agoTutorial
This official Hugging Face blog post provides a detailed walkthrough of how to combine the `Accelerate` library with Microsoft's `DeepSpeed` deep learning…
詳解擴散模型：The Annotated Diffusion Model 程式碼與原理實戰指南★ 85
Hugging Face Blog1,512 days agoTutorial
This classic blog post from Hugging Face, "The Annotated Diffusion Model," is an essential guide for learning about generative AI image synthesis. Modeled…
歡迎 fastai 加入 Hugging Face Hub 平台
Hugging Face Blog1,544 days agoRelease
Hugging Face has officially announced a deep integration with the well-known high-level deep learning library fastai, formally bringing fastai into the Hugging…
使用 PyTorch Fully Sharded Data Parallel (FSDP) 加速超大型模型訓練★ 75
Hugging Face Blog1,548 days agoRelease
As AI model scale has grown exponentially, training large models with billions of parameters has become the norm — but this also presents enormous hardware…
利用 Intel 技術加速 PyTorch 分散式微調
Hugging Face Blog1,712 days agoTutorial
While GPUs dominate deep learning training today, a collaboration between Intel and Hugging Face demonstrates that through software and hardware optimization…
在 CPU 上擴展 BERT 推論效能（第一部分）
Hugging Face Blog1,925 days agoTutorial
In many real-world enterprise production environments, although GPUs offer extremely high throughput for deep learning inference, CPUs remain indispensable due…
介紹 🤗 Accelerate：輕鬆實現 PyTorch 分佈式與混合精度訓練的輕量級庫★ 78
Hugging Face Blog1,929 days agoRelease
Hugging Face has officially released a new open-source library called `Accelerate` — a lightweight helper library designed for PyTorch that aims to solve the…
使用區塊稀疏矩陣（Block Sparse Matrices）打造更小、更快的語言模型
Hugging Face Blog2,147 days agoTutorial
In the field of natural language processing (NLP), the Transformer architecture has become the dominant paradigm, but its core self-attention mechanism…

Latest in AI

Profiling in PyTorch Part 2: From nn.Linear to a Fused MLP

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

Safetensors 正式加入 PyTorch 基金會，加速推動安全且高效的模型權重標準★ 75

Transformers v5 正式發布：簡化模型定義，全面賦能 AI 生態系★ 90

Arm 將參展 PyTorch Conference，展示 Arm 架構上的 AI 推論與 KleidiAI 優化技術

使用 Torch Compile 快取加速模型啟動與推論速度★ 75

讓你的 ZeroGPU Spaces 速度飛起：利用 PyTorch AOT 提前編譯技術消除冷啟動延遲★ 75

從零到 GPU：構建與擴展生產級 CUDA Kernel 實戰指南★ 80

從零開始在 nanoVLM 中實作 KV Cache★ 75

nanoVLM：用純 PyTorch 訓練視覺語言模型（VLM）的最簡開源專案★ 75

在 PyTorch 中視覺化與理解 GPU 記憶體佔用★ 82

Hugging Face 釋出 Accelerate 1.0.0 正式版：邁向分散式訓練與大模型推理的全新里程碑★ 80

使用 Quanto 與 Diffusers 打造記憶體高效的 Diffusion Transformers (DiT)★ 80

Hugging Face 推出 Quanto：適用於 Optimum 的全新 PyTorch 量化後端★ 75

Optimum + ONNX Runtime：讓 Hugging Face 模型訓練更簡單、更快速★ 75

使用 Intel Sapphire Rapids 加速 PyTorch Transformers 模型 - 第一部分

Hugging Face 揭秘：🤗 Accelerate 如何藉助 PyTorch 運行超大型模型★ 80

在 M1 Mac 的 GPU 上本地運行 Stable Diffusion

使用 PyTorch 實作策略梯度（Policy Gradient）：Hugging Face 深度強化學習教學

使用 DeepSpeed 與 Hugging Face Accelerate 加速超大型模型訓練★ 75

詳解擴散模型：The Annotated Diffusion Model 程式碼與原理實戰指南★ 85

歡迎 fastai 加入 Hugging Face Hub 平台

使用 PyTorch Fully Sharded Data Parallel (FSDP) 加速超大型模型訓練★ 75

利用 Intel 技術加速 PyTorch 分散式微調

在 CPU 上擴展 BERT 推論效能（第一部分）

介紹 🤗 Accelerate：輕鬆實現 PyTorch 分佈式與混合精度訓練的輕量級庫★ 78

使用區塊稀疏矩陣（Block Sparse Matrices）打造更小、更快的語言模型