Latest in AI

Showing:inferenceResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

效率化請求佇列：優化 LLM 推論效能的關鍵策略★ 75
Hugging Face Blog482 days agoTutorial
### The Unique Challenges and Memory Bottlenecks of LLM Inference Traditional web services primarily handle concurrent requests through multi-threading or…
在 Intel Gaudi 上使用 TGI 加速大型語言模型（LLM）推理★ 75
Hugging Face Blog487 days agoRelease
Hugging Face's official blog has announced that its widely adopted open-source large model inference framework, Text Generation Inference (TGI), now officially…
Hugging Face Inference Endpoints 推出全新分析儀表板，全面提升模型監控與 MLOps 體驗
Hugging Face Blog494 days agoRelease
Hugging Face recently announced a major upgrade to its hosted model deployment service, "Inference Endpoints," introducing a brand-new and far more modern…
Hugging Face 推出三家全新無伺服器推論服務商：Hyperbolic、Nebius AI Studio 與 Novita AI★ 75
Hugging Face Blog525 days agoRelease
On February 18, 2025, Hugging Face announced the addition of three new partners to its serverless inference ecosystem: Hyperbolic, Nebius AI Studio, and Novita…
歡迎 Fireworks.ai 加入 Hugging Face Hub 🎆★ 75
Hugging Face Blog529 days agoRelease
On February 14, 2025, Hugging Face — the leading open-source AI community — officially announced the integration of high-performance AI inference platform…
10 億次分類的啟示：Hugging Face 分享如何用開源模型極速且超低成本完成大規模分類任務★ 80
Hugging Face Blog530 days agoTutorial
In the current era of generative AI sweeping the globe, many developers habitually feed all tasks — including simple text classification, sentiment analysis…
如何在 AWS 上部署與微調 DeepSeek 模型：Hugging Face 官方指南★ 85
Hugging Face Blog544 days agoTutorial
As DeepSeek-R1 swept through the AI landscape on the strength of its powerful reasoning capabilities, how to safely and efficiently deploy and fine-tune these…
Hugging Face Hub 推出「Inference Providers」：一鍵切換多個第三方高效能推理服務商★ 85
Hugging Face Blog546 days agoRelease
Hugging Face has officially launched the "Inference Providers" feature on the Hugging Face Hub — a major update designed to address the pain points developers…
Hugging Face TGI 宣布支援多後端引擎：整合 TensorRT-LLM 與 vLLM★ 85
Hugging Face Blog558 days agoRelease
Text Generation Inference (TGI), Hugging Face's open-source LLM inference and deployment framework, has received a major architectural update, officially…
Replicate 正式支援 NVIDIA L40S GPU：性能更佳、成本更低
Replicate Blog620 days agoNew Tool
The AI deployment platform Replicate has announced the official availability of NVIDIA L40S GPU compute on its platform. This update aims to provide developers…
微調 LLM 至 1.58-bit：讓極限模型量化變得簡單★ 85
Hugging Face Blog678 days agoTutorial
The deployment of large language models (LLMs) has long faced a dual bottleneck of VRAM capacity and memory bandwidth. Microsoft previously introduced the…
GGML 基礎入門介紹：讓大語言模型在消費級硬體上高效運行的關鍵技術★ 80
Hugging Face Blog714 days agoTutorial
GGML is a lightweight, zero-dependency C/C++ tensor library developed by Georgi Gerganov. It was originally designed to enable efficient local inference of the…
Hugging Face 聯手 NVIDIA NIM 推出無伺服器推論服務 (Serverless Inference)★ 82
Hugging Face Blog729 days agoRelease
Hugging Face and NVIDIA announced a major partnership in late July 2024, officially launching a serverless inference service powered by NVIDIA NIM (NVIDIA…
TGI Multi-LoRA：部署一次即可同時提供 30 個微調模型服務★ 80
Hugging Face Blog740 days agoRelease
The Hugging Face official blog has introduced a major update to its open-source text generation inference engine, Text Generation Inference (TGI): the…
Google Cloud TPU 正式登陸 Hugging Face，支援 Inference Endpoints 與 Spaces★ 75
Hugging Face Blog749 days agoRelease
Hugging Face announced a deep partnership with Google Cloud, officially integrating Google Cloud TPUs (Tensor Processing Units) into the Hugging Face platform…
NVIDIA H100 GPU 即將登陸 Replicate：支援更快速的模型推理與訓練
Replicate Blog776 days agoRelease
The official blog of Replicate, the popular AI model hosting and deployment platform, has announced that NVIDIA H100 Tensor Core GPUs will soon be officially…
評測 Text Generation Inference (TGI)：如何量化與優化大語言模型推理性能★ 75
Hugging Face Blog790 days agoTutorial
This official Hugging Face blog post takes an in-depth look at how to benchmark Text Generation Inference (TGI), Hugging Face's open-source LLM inference and…
在 Hugging Face 上輕鬆將模型部署至 AWS Inferentia2 晶片★ 75
Hugging Face Blog797 days agoRelease
Hugging Face has announced official support for AWS Inferentia2 (Inf2) instances within its hosted Inference Endpoints service. This update gives developers…
使用 Intel Gaudi 2 與 Intel Xeon 建構高性價比的企業級 RAG 應用★ 70
Hugging Face Blog810 days agoTutorial
As enterprise demand for Retrieval-Augmented Generation (RAG) technology surges, how to maintain high performance while controlling hardware costs has become…
在 Hugging Face Endpoints 上運行隱私保護的全同態加密 (FHE) 推理★ 75
Hugging Face Blog833 days agoRelease
This article introduces how to run privacy-preserving inference based on Fully Homomorphic Encryption (FHE) on Hugging Face Endpoints. In traditional…
Optimum-NVIDIA：只需一行程式碼，即可解鎖極速 LLM 推理★ 80
Hugging Face Blog966 days agoRelease
Hugging Face announced the launch of a new open-source library called "Optimum-NVIDIA," the result of a deep collaboration with NVIDIA, aimed at seamlessly…
告別冷啟動：Hugging Face 如何將 LoRA 推論速度提升 300%★ 85
Hugging Face Blog966 days agoRelease
In real-world generative AI applications, fine-tuning for specific tasks or clients is a common requirement. However, deploying a full base model for every…
讓你的 Llama 生成速度飛起來：使用 AWS Inferentia2 進行加速★ 72
Hugging Face Blog994 days agoTutorial
As large language models (LLMs) such as Llama 2 become more widely adopted, achieving efficient and cost-effective inference in production environments has…
使用 Hugging Face Inference Endpoints 輕鬆部署高效能嵌入模型★ 75
Hugging Face Blog1,008 days agoRelease
As large language models (LLMs) and Retrieval-Augmented Generation (RAG) technology become increasingly widespread, embedding models have become an…
Hugging Face 為 PRO 訂閱者推出專屬推理服務：更高速率、支援大型開源模型★ 70
Hugging Face Blog1,040 days agoRelease
The Hugging Face official blog has announced a new "Inference for PROs" upgraded service for PRO subscribers (at $9 per month). This service is designed to…
Hugging Face Transformers 原生支援量化方案全解析：bitsandbytes 與 GPTQ 實戰指南★ 75
Hugging Face Blog1,050 days agoTutorial
As the parameter count of large language models (LLMs) has grown dramatically, running and fine-tuning these models on consumer-grade GPUs or limited hardware…
Fetch 採用 Amazon SageMaker 與 Hugging Face，成功降低 50% 機器學習處理延遲
Hugging Face Blog1,061 days agoBusiness
This case study examines how Fetch, a leading consumer rewards platform in the United States, leveraged the collaboration between Amazon SageMaker and Hugging…
Hugging Face 的開源文本生成與 LLM 生態系全景指南★ 85
Hugging Face Blog1,107 days agoRelease
This official Hugging Face blog post systematically maps out the complete ecosystem it has built around open-source large language models (LLMs). As…
使用 Hugging Face Inference Endpoints 輕鬆部署大型語言模型 (LLM)★ 75
Hugging Face Blog1,120 days agoTutorial
This official Hugging Face blog post introduces how to use their hosted service "Inference Endpoints" to deploy large language models (LLMs). With the rapid…
Falcon 系列開源模型正式登陸 Hugging Face 生態系統★ 75
Hugging Face Blog1,149 days agoRelease
The Falcon series of large language models (including Falcon-40B and Falcon-7B), developed by Abu Dhabi's Technology Innovation Institute (TII), have…

← PreviousPage 2Next →

Latest in AI

效率化請求佇列：優化 LLM 推論效能的關鍵策略★ 75

在 Intel Gaudi 上使用 TGI 加速大型語言模型（LLM）推理★ 75

Hugging Face Inference Endpoints 推出全新分析儀表板，全面提升模型監控與 MLOps 體驗

Hugging Face 推出三家全新無伺服器推論服務商：Hyperbolic、Nebius AI Studio 與 Novita AI★ 75

歡迎 Fireworks.ai 加入 Hugging Face Hub 🎆★ 75

10 億次分類的啟示：Hugging Face 分享如何用開源模型極速且超低成本完成大規模分類任務★ 80

如何在 AWS 上部署與微調 DeepSeek 模型：Hugging Face 官方指南★ 85

Hugging Face Hub 推出「Inference Providers」：一鍵切換多個第三方高效能推理服務商★ 85

Hugging Face TGI 宣布支援多後端引擎：整合 TensorRT-LLM 與 vLLM★ 85

Replicate 正式支援 NVIDIA L40S GPU：性能更佳、成本更低

微調 LLM 至 1.58-bit：讓極限模型量化變得簡單★ 85

GGML 基礎入門介紹：讓大語言模型在消費級硬體上高效運行的關鍵技術★ 80

Hugging Face 聯手 NVIDIA NIM 推出無伺服器推論服務 (Serverless Inference)★ 82

TGI Multi-LoRA：部署一次即可同時提供 30 個微調模型服務★ 80

Google Cloud TPU 正式登陸 Hugging Face，支援 Inference Endpoints 與 Spaces★ 75

NVIDIA H100 GPU 即將登陸 Replicate：支援更快速的模型推理與訓練

評測 Text Generation Inference (TGI)：如何量化與優化大語言模型推理性能★ 75

在 Hugging Face 上輕鬆將模型部署至 AWS Inferentia2 晶片★ 75

使用 Intel Gaudi 2 與 Intel Xeon 建構高性價比的企業級 RAG 應用★ 70

在 Hugging Face Endpoints 上運行隱私保護的全同態加密 (FHE) 推理★ 75

Optimum-NVIDIA：只需一行程式碼，即可解鎖極速 LLM 推理★ 80

告別冷啟動：Hugging Face 如何將 LoRA 推論速度提升 300%★ 85

讓你的 Llama 生成速度飛起來：使用 AWS Inferentia2 進行加速★ 72

使用 Hugging Face Inference Endpoints 輕鬆部署高效能嵌入模型★ 75

Hugging Face 為 PRO 訂閱者推出專屬推理服務：更高速率、支援大型開源模型★ 70

Hugging Face Transformers 原生支援量化方案全解析：bitsandbytes 與 GPTQ 實戰指南★ 75

Fetch 採用 Amazon SageMaker 與 Hugging Face，成功降低 50% 機器學習處理延遲

Hugging Face 的開源文本生成與 LLM 生態系全景指南★ 85

使用 Hugging Face Inference Endpoints 輕鬆部署大型語言模型 (LLM)★ 75

Falcon 系列開源模型正式登陸 Hugging Face 生態系統★ 75