Latest in AI

Showing:vllmDevelopersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Engineering: Heaps Do Lie — Debugging a Memory Leak in vLLM
Mistral AI News40 days agoTutorial
Mathis Felardos, a Mistral AI engineer, shares a technical deep-dive into tracking down a memory leak in vLLM, the widely adopted open-source LLM inference server. The investigation exposed a core frustration in systems debugging: heap profiling tools can actively mislead engineers rather than illuminate the true source of memory growth. The post offers practical engineering insight for teams operating LLM serving infrastructure in production.
Releasing Cohere North Mini Code
r/LocalLLaMA top day48 days agoRelease
Cohere’s Jay Alammar announced the official release of North Mini Code after early community feedback from r/LocalLLaMA. Weights are available on Hugging Face, including an fp8 version, and the model can be tried for free through OpenCode. For vLLM deployment, Cohere recommends using vLLM main for now and installing cohere_melody for accurate response parsing, while noting community requests for quantization and llama.cpp support.
vLLM V0 到 V1 的演進：在強化學習（RL）中「正確性重於修正」的實踐★ 75
Hugging Face Blog82 days agoOpinion
This blog post published by the ServiceNow AI team delves into the major transition of the open-source large language model inference engine vLLM from V0 to…
讓 GPU 毫無閒置：利用 TRL 中協同部署的 vLLM 解鎖高效能強化學習訓練★ 85
Hugging Face Blog420 days agoRelease
In the reinforcement learning from human feedback (RLHF) training process for large language models — whether PPO or the recently popular GRPO — there are…
效率化請求佇列：優化 LLM 推論效能的關鍵策略★ 75
Hugging Face Blog482 days agoTutorial
### The Unique Challenges and Memory Bottlenecks of LLM Inference Traditional web services primarily handle concurrent requests through multi-threading or…
Hugging Face TGI 宣布支援多後端引擎：整合 TensorRT-LLM 與 vLLM★ 85
Hugging Face Blog558 days agoRelease
Text Generation Inference (TGI), Hugging Face's open-source LLM inference and deployment framework, has received a major architectural update, officially…
Outlines-core 0.1.0 正式發布：支援 Rust 與 Python 的高效能結構化生成庫★ 75
Hugging Face Blog644 days agoRelease
In LLM application development, ensuring that a model outputs content that 100% conforms to a specific format — such as a JSON Schema, a regular expression, or…
在生產環境中優化你的大語言模型 (LLM) — Hugging Face 實戰指南★ 85
Hugging Face Blog1,047 days agoTutorial
This technical guide from Hugging Face systematically introduces the core strategies for deploying and optimizing large language models (LLMs) in production…