Hugging Face BlogAug 17, 2022, 12:00 AMimportant 80

輕鬆上手 8-bit 矩陣乘法：使用 Transformers、Accelerate 與 bitsandbytes 實現超大規模 Transformer 模型量化

Original: A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

This article introduces the deep integration between Hugging Face and the bitsandbytes library, aimed at solving the enormous memory…

Hugging Face 宣布與 bitsandbytes 深度整合，支援 LLM.int8() 8-bit 量化技術。此技術透過混合精度分解，將極端值保留在 FP16，其餘進行 8-bit 量化，成功讓大模型（如 BLOOM-176B）的記憶體需求減半。開發者現在只需在 from_pretrained 中加入 load_in_8bit=True，即可在消費級 GPU 上運行原本需要多張顯卡的大型語言模型。

This article introduces the deep integration between Hugging Face and the bitsandbytes library, aimed at solving the enormous memory challenges posed by extremely large Transformer models (such as those with 175B parameters) during inference and fine-tuning. This integration is based on the paper "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale" by Tim Dettmers and colleagues.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source transformers accelerate bitsandbytes #quantization #llm-int8 #gpu-memory #inference

Summaries are AI-generated; the original article is authoritative.