Latest in AI

Showing:gpu-optimizationResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Real-Time LLM Inference on Standard GPUs at 3k Tokens/s per Request
Hacker News (AI keywords)60 days agoBenchmark
The post’s title indicates a performance claim for real-time LLM inference on standard GPUs, reporting 3,000 tokens per second per request. No article body is available, so the underlying model, GPU type, batch size, latency profile, precision, serving stack, and benchmark method are not stated. The item is best treated as an inference-performance benchmark claim rather than a verified deployment guide.
Waypoint-1.5：讓家用 GPU 也能運行高保真度互動式虛擬世界★ 75
Hugging Face Blog110 days agoRelease
As artificial intelligence advances toward Embodied AI and real-world physical interaction, high-fidelity 3D simulation environments have long been an…
Hugging Face 釋出新技術：讓 AI Agent 具備自動編寫與優化自訂 CUDA Kernel 的能力★ 80
Hugging Face Blog165 days agoNew Tool
As the demand for computational efficiency in deep learning models continues to grow, writing custom CUDA kernels (GPU core programs) has become a key…
讓 GPU 毫無閒置：利用 TRL 中協同部署的 vLLM 解鎖高效能強化學習訓練★ 85
Hugging Face Blog420 days agoRelease
In the reinforcement learning from human feedback (RLHF) training process for large language models — whether PPO or the recently popular GRPO — there are…
TGI Multi-LoRA：部署一次即可同時提供 30 個微調模型服務★ 80
Hugging Face Blog740 days agoRelease
The Hugging Face official blog has introduced a major update to its open-source text generation inference engine, Text Generation Inference (TGI): the…