Latest in AI

Showing:inferenceClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

告別冷啟動：Hugging Face 如何將 LoRA 推論速度提升 300%★ 85
Hugging Face Blog966 days agoRelease
In real-world generative AI applications, fine-tuning for specific tasks or clients is a common requirement. However, deploying a full base model for every…
Optimum-NVIDIA：只需一行程式碼，即可解鎖極速 LLM 推理★ 80
Hugging Face Blog966 days agoRelease
Hugging Face announced the launch of a new open-source library called "Optimum-NVIDIA," the result of a deep collaboration with NVIDIA, aimed at seamlessly…
讓你的 Llama 生成速度飛起來：使用 AWS Inferentia2 進行加速★ 72
Hugging Face Blog994 days agoTutorial
As large language models (LLMs) such as Llama 2 become more widely adopted, achieving efficient and cost-effective inference in production environments has…
使用 Hugging Face Inference Endpoints 輕鬆部署高效能嵌入模型★ 75
Hugging Face Blog1,008 days agoRelease
As large language models (LLMs) and Retrieval-Augmented Generation (RAG) technology become increasingly widespread, embedding models have become an…
Hugging Face 為 PRO 訂閱者推出專屬推理服務：更高速率、支援大型開源模型★ 70
Hugging Face Blog1,040 days agoRelease
The Hugging Face official blog has announced a new "Inference for PROs" upgraded service for PRO subscribers (at $9 per month). This service is designed to…
Hugging Face Transformers 原生支援量化方案全解析：bitsandbytes 與 GPTQ 實戰指南★ 75
Hugging Face Blog1,050 days agoTutorial
As the parameter count of large language models (LLMs) has grown dramatically, running and fine-tuning these models on consumer-grade GPUs or limited hardware…
Fetch 採用 Amazon SageMaker 與 Hugging Face，成功降低 50% 機器學習處理延遲
Hugging Face Blog1,061 days agoBusiness
This case study examines how Fetch, a leading consumer rewards platform in the United States, leveraged the collaboration between Amazon SageMaker and Hugging…
Hugging Face 的開源文本生成與 LLM 生態系全景指南★ 85
Hugging Face Blog1,107 days agoRelease
This official Hugging Face blog post systematically maps out the complete ecosystem it has built around open-source large language models (LLMs). As…
使用 Hugging Face Inference Endpoints 輕鬆部署大型語言模型 (LLM)★ 75
Hugging Face Blog1,120 days agoTutorial
This official Hugging Face blog post introduces how to use their hosted service "Inference Endpoints" to deploy large language models (LLMs). With the rapid…
Falcon 系列開源模型正式登陸 Hugging Face 生態系統★ 75
Hugging Face Blog1,149 days agoRelease
The Falcon series of large language models (including Falcon-40B and Falcon-7B), developed by Abu Dhabi's Technology Innovation Institute (TII), have…
使用 AWS Inferentia2 加速 Hugging Face Transformers 模型推理★ 70
Hugging Face Blog1,198 days agoRelease
This article explains how to accelerate the deployment and inference of Hugging Face Transformers models using AWS Inferentia2 (Inf2 instances) — AWS's…
在 Habana Gaudi2 加速器上實現大型語言模型快速推理：以 BLOOMZ 為例
Hugging Face Blog1,218 days agoTutorial
This article presents the results of a collaboration between Hugging Face and the Intel Habana team, focusing on how to leverage Intel's Habana Gaudi2 deep…
為什麼我們轉向使用 Hugging Face Inference Endpoints，或許你也應該試試
Hugging Face Blog1,259 days agoOpinion
This case study from Mantis NLP details the core reasons behind their decision to migrate their machine learning model deployment workflow from traditional…
使用 Intel Sapphire Rapids 加速 PyTorch Transformer 模型推論（第二部分）
Hugging Face Blog1,268 days agoTutorial
This article is the second installment of a Hugging Face series on accelerating PyTorch Transformer models on Intel's 4th-generation Xeon Scalable Processors…
使用 Intel Sapphire Rapids 加速 PyTorch Transformers 模型 - 第一部分
Hugging Face Blog1,303 days agoTutorial
This article is the first installment in a collaboration series between Hugging Face and Intel, focusing on how to accelerate PyTorch Transformer models using…
Hugging Face 推理解決方案全景指南：從免費 API 到企業級部署★ 75
Hugging Face Blog1,345 days agoTutorial
As the world's largest open-source AI model hub, Hugging Face not only provides model hosting but has also built a complete inference ecosystem. This article…
使用 🤗 Optimum Intel 與 OpenVINO 加速你的 Hugging Face 模型
Hugging Face Blog1,364 days agoNew Tool
As Transformer models become increasingly prevalent in natural language processing (NLP) and computer vision (CV), efficiently deploying these large models in…
Hugging Face Inference Endpoints 入門指南：輕鬆部署生產級 AI 模型★ 75
Hugging Face Blog1,383 days agoTutorial
Hugging Face Inference Endpoints is a fully managed service designed for developers and enterprises, built to solve the pain points of deploying machine…
Hugging Face 揭秘：🤗 Accelerate 如何藉助 PyTorch 運行超大型模型★ 80
Hugging Face Blog1,400 days agoTutorial
As the parameter counts of large language models (LLMs) grow exponentially, how to load and run these models on limited hardware has become a major pain point…
使用 DeepSpeed 與 Accelerate 實現極速 BLOOM 模型推理
Hugging Face Blog1,411 days agoTutorial
BLOOM is a massive open-source multilingual model with 176 billion parameters. Running BLOOM at FP16 precision requires at least 352 GB of video memory (VRAM)…
輕鬆上手 8-bit 矩陣乘法：使用 Transformers、Accelerate 與 bitsandbytes 實現超大規模 Transformer 模型量化★ 80
Hugging Face Blog1,441 days agoRelease
This article introduces the deep integration between Hugging Face and the bitsandbytes library, aimed at solving the enormous memory challenges posed by…
使用 Hugging Face Optimum 將 Transformers 模型轉換為 ONNX 格式
Hugging Face Blog1,497 days agoTutorial
When deploying Transformer models in production, latency and throughput are typically the key factors determining the quality of the user experience. ONNX…
使用 Optimum 與 Transformers Pipelines 加速模型推論★ 75
Hugging Face Blog1,540 days agoRelease
When deploying Transformer models in production, reducing inference latency and increasing throughput while keeping computational costs under control has…
使用 Hugging Face Transformers 與 Amazon SageMaker 部署 GPT-J 6B 進行推論
Hugging Face Blog1,659 days agoTutorial
With the rise of open-source large language models, deploying these models in cloud environments in a secure, stable, and scalable manner has become a critical…
在現代 CPU 上擴展 BERT 類模型的推理效能 - 第二部分
Hugging Face Blog1,727 days agoTutorial
This blog post is the second part of a technical guide co-authored by Hugging Face and Intel, designed to show developers how to push the inference performance…
Hugging Face 如何為 API 客戶將 Transformer 推理速度提升 100 倍
Hugging Face Blog2,017 days agoRelease
In this technical blog post, the Hugging Face team reveals in detail how they achieved up to 100x speedup in inference for Transformer models for customers of…

← PreviousPage 3

Latest in AI

告別冷啟動：Hugging Face 如何將 LoRA 推論速度提升 300%★ 85

Optimum-NVIDIA：只需一行程式碼，即可解鎖極速 LLM 推理★ 80

讓你的 Llama 生成速度飛起來：使用 AWS Inferentia2 進行加速★ 72

使用 Hugging Face Inference Endpoints 輕鬆部署高效能嵌入模型★ 75

Hugging Face 為 PRO 訂閱者推出專屬推理服務：更高速率、支援大型開源模型★ 70

Hugging Face Transformers 原生支援量化方案全解析：bitsandbytes 與 GPTQ 實戰指南★ 75

Fetch 採用 Amazon SageMaker 與 Hugging Face，成功降低 50% 機器學習處理延遲

Hugging Face 的開源文本生成與 LLM 生態系全景指南★ 85

使用 Hugging Face Inference Endpoints 輕鬆部署大型語言模型 (LLM)★ 75

Falcon 系列開源模型正式登陸 Hugging Face 生態系統★ 75

使用 AWS Inferentia2 加速 Hugging Face Transformers 模型推理★ 70

在 Habana Gaudi2 加速器上實現大型語言模型快速推理：以 BLOOMZ 為例

為什麼我們轉向使用 Hugging Face Inference Endpoints，或許你也應該試試

使用 Intel Sapphire Rapids 加速 PyTorch Transformer 模型推論（第二部分）

使用 Intel Sapphire Rapids 加速 PyTorch Transformers 模型 - 第一部分

Hugging Face 推理解決方案全景指南：從免費 API 到企業級部署★ 75

使用 🤗 Optimum Intel 與 OpenVINO 加速你的 Hugging Face 模型

Hugging Face Inference Endpoints 入門指南：輕鬆部署生產級 AI 模型★ 75

Hugging Face 揭秘：🤗 Accelerate 如何藉助 PyTorch 運行超大型模型★ 80

使用 DeepSpeed 與 Accelerate 實現極速 BLOOM 模型推理

輕鬆上手 8-bit 矩陣乘法：使用 Transformers、Accelerate 與 bitsandbytes 實現超大規模 Transformer 模型量化★ 80

使用 Hugging Face Optimum 將 Transformers 模型轉換為 ONNX 格式

使用 Optimum 與 Transformers Pipelines 加速模型推論★ 75

使用 Hugging Face Transformers 與 Amazon SageMaker 部署 GPT-J 6B 進行推論

在現代 CPU 上擴展 BERT 類模型的推理效能 - 第二部分

Hugging Face 如何為 API 客戶將 Transformer 推理速度提升 100 倍