Latest in AI

Showing:inferenceResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

使用 AWS Inferentia2 加速 Hugging Face Transformers 模型推理★ 70
Hugging Face Blog1,198 days agoRelease
This article explains how to accelerate the deployment and inference of Hugging Face Transformers models using AWS Inferentia2 (Inf2 instances) — AWS's…
在 Habana Gaudi2 加速器上實現大型語言模型快速推理：以 BLOOMZ 為例
Hugging Face Blog1,218 days agoTutorial
This article presents the results of a collaboration between Hugging Face and the Intel Habana team, focusing on how to leverage Intel's Habana Gaudi2 deep…
為什麼我們轉向使用 Hugging Face Inference Endpoints，或許你也應該試試
Hugging Face Blog1,259 days agoOpinion
This case study from Mantis NLP details the core reasons behind their decision to migrate their machine learning model deployment workflow from traditional…
使用 Intel Sapphire Rapids 加速 PyTorch Transformer 模型推論（第二部分）
Hugging Face Blog1,268 days agoTutorial
This article is the second installment of a Hugging Face series on accelerating PyTorch Transformer models on Intel's 4th-generation Xeon Scalable Processors…
使用 Intel Sapphire Rapids 加速 PyTorch Transformers 模型 - 第一部分
Hugging Face Blog1,303 days agoTutorial
This article is the first installment in a collaboration series between Hugging Face and Intel, focusing on how to accelerate PyTorch Transformer models using…
Hugging Face 推理解決方案全景指南：從免費 API 到企業級部署★ 75
Hugging Face Blog1,345 days agoTutorial
As the world's largest open-source AI model hub, Hugging Face not only provides model hosting but has also built a complete inference ecosystem. This article…
使用 🤗 Optimum Intel 與 OpenVINO 加速你的 Hugging Face 模型
Hugging Face Blog1,364 days agoNew Tool
As Transformer models become increasingly prevalent in natural language processing (NLP) and computer vision (CV), efficiently deploying these large models in…
Hugging Face Inference Endpoints 入門指南：輕鬆部署生產級 AI 模型★ 75
Hugging Face Blog1,383 days agoTutorial
Hugging Face Inference Endpoints is a fully managed service designed for developers and enterprises, built to solve the pain points of deploying machine…
Hugging Face 揭秘：🤗 Accelerate 如何藉助 PyTorch 運行超大型模型★ 80
Hugging Face Blog1,400 days agoTutorial
As the parameter counts of large language models (LLMs) grow exponentially, how to load and run these models on limited hardware has become a major pain point…
使用 DeepSpeed 與 Accelerate 實現極速 BLOOM 模型推理
Hugging Face Blog1,411 days agoTutorial
BLOOM is a massive open-source multilingual model with 176 billion parameters. Running BLOOM at FP16 precision requires at least 352 GB of video memory (VRAM)…
輕鬆上手 8-bit 矩陣乘法：使用 Transformers、Accelerate 與 bitsandbytes 實現超大規模 Transformer 模型量化★ 80
Hugging Face Blog1,441 days agoRelease
This article introduces the deep integration between Hugging Face and the bitsandbytes library, aimed at solving the enormous memory challenges posed by…
使用 Hugging Face Optimum 將 Transformers 模型轉換為 ONNX 格式
Hugging Face Blog1,497 days agoTutorial
When deploying Transformer models in production, latency and throughput are typically the key factors determining the quality of the user experience. ONNX…
使用 Optimum 與 Transformers Pipelines 加速模型推論★ 75
Hugging Face Blog1,540 days agoRelease
When deploying Transformer models in production, reducing inference latency and increasing throughput while keeping computational costs under control has…
使用 Hugging Face Transformers 與 Amazon SageMaker 部署 GPT-J 6B 進行推論
Hugging Face Blog1,659 days agoTutorial
With the rise of open-source large language models, deploying these models in cloud environments in a secure, stable, and scalable manner has become a critical…
在現代 CPU 上擴展 BERT 類模型的推理效能 - 第二部分
Hugging Face Blog1,727 days agoTutorial
This blog post is the second part of a technical guide co-authored by Hugging Face and Intel, designed to show developers how to push the inference performance…
Hugging Face 如何為 API 客戶將 Transformer 推理速度提升 100 倍
Hugging Face Blog2,017 days agoRelease
In this technical blog post, the Hugging Face team reveals in detail how they achieved up to 100x speedup in inference for Transformer models for customers of…

← PreviousPage 3

Latest in AI

使用 AWS Inferentia2 加速 Hugging Face Transformers 模型推理★ 70

在 Habana Gaudi2 加速器上實現大型語言模型快速推理：以 BLOOMZ 為例

為什麼我們轉向使用 Hugging Face Inference Endpoints，或許你也應該試試

使用 Intel Sapphire Rapids 加速 PyTorch Transformer 模型推論（第二部分）

使用 Intel Sapphire Rapids 加速 PyTorch Transformers 模型 - 第一部分

Hugging Face 推理解決方案全景指南：從免費 API 到企業級部署★ 75

使用 🤗 Optimum Intel 與 OpenVINO 加速你的 Hugging Face 模型

Hugging Face Inference Endpoints 入門指南：輕鬆部署生產級 AI 模型★ 75

Hugging Face 揭秘：🤗 Accelerate 如何藉助 PyTorch 運行超大型模型★ 80

使用 DeepSpeed 與 Accelerate 實現極速 BLOOM 模型推理

輕鬆上手 8-bit 矩陣乘法：使用 Transformers、Accelerate 與 bitsandbytes 實現超大規模 Transformer 模型量化★ 80

使用 Hugging Face Optimum 將 Transformers 模型轉換為 ONNX 格式

使用 Optimum 與 Transformers Pipelines 加速模型推論★ 75

使用 Hugging Face Transformers 與 Amazon SageMaker 部署 GPT-J 6B 進行推論

在現代 CPU 上擴展 BERT 類模型的推理效能 - 第二部分

Hugging Face 如何為 API 客戶將 Transformer 推理速度提升 100 倍