Latest in AI

Showing:distributed-trainingResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Decoupled DiLoCo：Google DeepMind 推出更具彈性的分散式 AI 訓練新技術★ 85
Google DeepMind Blog97 days agoRelease
Google DeepMind has recently unveiled a new distributed AI training technique called "Decoupled DiLoCo." This technology represents a major upgrade to its…
ImportAI 449：LLM 訓練 LLM、72B 分散式訓練、為什麼電腦視覺比文本生成更難？以及 AI 是否會引發政治過渡期？★ 75
Import AI (Jack Clark)134 days agoCommentary
This issue of Import AI (No. 449) dives deep into several core frontier topics in the current AI landscape, spanning technical breakthroughs and broad…
Ulysses 序列平行化：實現百萬 Token 超長上下文的模型訓練技術解析★ 78
Hugging Face Blog141 days agoTutorial
As large language models (LLMs) push the demand for long context toward the million-token scale, the VRAM of a single GPU can no longer accommodate the…
Hugging Face Accelerate ND-Parallel 指南：高效多 GPU 訓練完全解析★ 80
Hugging Face Blog354 days agoTutorial
As the parameter counts of generative AI and large language models (LLMs) push into the tens and hundreds of billions, the memory of a single GPU has long been…
Hugging Face 釋出 Accelerate 1.0.0 正式版：邁向分散式訓練與大模型推理的全新里程碑★ 80
Hugging Face Blog683 days agoRelease
Hugging Face has officially released version 1.0.0 of its core open-source library, Accelerate. This is a milestone update, signifying that since the project's…
從 DeepSpeed 到 FSDP 再切換回來：使用 Hugging Face Accelerate 實現無縫分散式訓練★ 75
Hugging Face Blog775 days agoTutorial
In the era of large language models (LLMs), the VRAM of a single GPU is often insufficient to hold models with tens of billions of parameters. To overcome this…
使用 PyTorch FSDP 高效微調 Llama 2 70B：解決 CPU 記憶體不足的實務指南★ 72
Hugging Face Blog1,049 days agoTutorial
When fine-tuning massively large open-source models like Llama 2 70B — with its 70 billion parameters — developers frequently encounter a bottleneck that goes…
使用 TensorFlow 與 TPU 透過 🤗 Transformers 訓練語言模型★ 70
Hugging Face Blog1,188 days agoTutorial
This technical guide from Hugging Face provides a detailed walkthrough of how to efficiently train language models by combining TensorFlow, the Hugging Face…
Databricks 與 Hugging Face 深度合作：大型語言模型（LLM）訓練與微調速度提升高達 40%★ 70
Hugging Face Blog1,189 days agoBusiness
This case study introduces a deep technical collaboration between Databricks and Hugging Face, aimed at addressing the efficiency and cost challenges…
使用 Hugging Face 與 Flower 進行聯邦學習（Federated Learning）★ 70
Hugging Face Blog1,219 days agoTutorial
As privacy awareness grows and regulatory requirements tighten, training machine learning models without centralizing sensitive data has become a critical…
從 PyTorch DDP 到 Accelerate 再到 Trainer：輕鬆掌握分散式訓練★ 75
Hugging Face Blog1,376 days agoTutorial
This classic technical blog post from Hugging Face systematically guides developers in understanding and mastering distributed training techniques within the…
Hugging Face 揭秘：🤗 Accelerate 如何藉助 PyTorch 運行超大型模型★ 80
Hugging Face Blog1,400 days agoTutorial
As the parameter counts of large language models (LLMs) grow exponentially, how to load and run these models on limited hardware has become a major pain point…
如何使用 Megatron-LM 訓練大型語言模型：Hugging Face 實戰指南★ 72
Hugging Face Blog1,420 days agoTutorial
As language model scales continue to expand, the memory (VRAM) of a single GPU has long been unable to accommodate models with tens or hundreds of billions of…
揭秘 BLOOM 訓練背後的技術：如何用 Megatron-DeepSpeed 訓練 1760 億參數開源大模型★ 80
Hugging Face Blog1,475 days agoTutorial
This article documents in detail how the BigScience project trained BLOOM, an open-source multilingual large language model with 176 billion parameters. This…
使用 DeepSpeed 與 Hugging Face Accelerate 加速超大型模型訓練★ 75
Hugging Face Blog1,491 days agoTutorial
This official Hugging Face blog post provides a detailed walkthrough of how to combine the `Accelerate` library with Microsoft's `DeepSpeed` deep learning…
使用 PyTorch Fully Sharded Data Parallel (FSDP) 加速超大型模型訓練★ 75
Hugging Face Blog1,548 days agoRelease
As AI model scale has grown exponentially, training large models with billions of parameters has become the norm — but this also presents enormous hardware…
利用 Intel 技術加速 PyTorch 分散式微調
Hugging Face Blog1,712 days agoTutorial
While GPUs dominate deep learning training today, a collaboration between Intel and Hugging Face demonstrates that through software and hardware optimization…
介紹 🤗 Accelerate：輕鬆實現 PyTorch 分佈式與混合精度訓練的輕量級庫★ 78
Hugging Face Blog1,929 days agoRelease
Hugging Face has officially released a new open-source library called `Accelerate` — a lightweight helper library designed for PyTorch that aims to solve the…
使用 🤗 Transformers 與 Amazon SageMaker 進行分散式訓練：以 BART/T5 摘要生成模型為例
Hugging Face Blog1,937 days agoTutorial
This technical guide, published by Hugging Face in 2021, details how to use Amazon SageMaker's managed infrastructure and distributed training capabilities to…
透過 DeepSpeed 與 FairScale 的 ZeRO 技術，讓 Hugging Face 訓練容納更多參數且速度更快★ 80
Hugging Face Blog2,016 days agoRelease
As the parameter scale of Transformer models (such as GPT, T5, etc.) grows exponentially, deep learning faces a severe "Memory Wall" challenge. With limited…

Latest in AI

Decoupled DiLoCo：Google DeepMind 推出更具彈性的分散式 AI 訓練新技術★ 85

ImportAI 449：LLM 訓練 LLM、72B 分散式訓練、為什麼電腦視覺比文本生成更難？以及 AI 是否會引發政治過渡期？★ 75

Ulysses 序列平行化：實現百萬 Token 超長上下文的模型訓練技術解析★ 78

Hugging Face Accelerate ND-Parallel 指南：高效多 GPU 訓練完全解析★ 80

Hugging Face 釋出 Accelerate 1.0.0 正式版：邁向分散式訓練與大模型推理的全新里程碑★ 80

從 DeepSpeed 到 FSDP 再切換回來：使用 Hugging Face Accelerate 實現無縫分散式訓練★ 75

使用 PyTorch FSDP 高效微調 Llama 2 70B：解決 CPU 記憶體不足的實務指南★ 72

使用 TensorFlow 與 TPU 透過 🤗 Transformers 訓練語言模型★ 70

Databricks 與 Hugging Face 深度合作：大型語言模型（LLM）訓練與微調速度提升高達 40%★ 70

使用 Hugging Face 與 Flower 進行聯邦學習（Federated Learning）★ 70

從 PyTorch DDP 到 Accelerate 再到 Trainer：輕鬆掌握分散式訓練★ 75

Hugging Face 揭秘：🤗 Accelerate 如何藉助 PyTorch 運行超大型模型★ 80

如何使用 Megatron-LM 訓練大型語言模型：Hugging Face 實戰指南★ 72

揭秘 BLOOM 訓練背後的技術：如何用 Megatron-DeepSpeed 訓練 1760 億參數開源大模型★ 80

使用 DeepSpeed 與 Hugging Face Accelerate 加速超大型模型訓練★ 75

使用 PyTorch Fully Sharded Data Parallel (FSDP) 加速超大型模型訓練★ 75

利用 Intel 技術加速 PyTorch 分散式微調

介紹 🤗 Accelerate：輕鬆實現 PyTorch 分佈式與混合精度訓練的輕量級庫★ 78

使用 🤗 Transformers 與 Amazon SageMaker 進行分散式訓練：以 BART/T5 摘要生成模型為例

透過 DeepSpeed 與 FairScale 的 ZeRO 技術，讓 Hugging Face 訓練容納更多參數且速度更快★ 80