Latest in AI

Showing:transformerClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Tiny hackable CUDA language model implementation
Hacker News (AI keywords)52 days agoNew Tool
This GitHub project implements a compact generative pretrained transformer as an autoregressive byte-level sequence model. Its README describes causal self-attention, RoPE, feed-forward layers, AdamW, cross-entropy training, and BLAS/OpenBLAS-backed matrix operations, with CUDA toolkit listed in setup steps. It is most useful as an educational and experimental codebase, not as a production-grade replacement for large commercial LLMs.
從零開始在 nanoVLM 中實作 KV Cache★ 75
Hugging Face Blog419 days agoTutorial
In the inference process of large language models (LLMs) and vision-language models (VLMs), autoregressive decoding is a major performance bottleneck. Each…
你也能設計出最先進的 Transformer 位置編碼：從直覺到 RoPE 的數學推導★ 75
Hugging Face Blog610 days agoTutorial
This educational article from Hugging Face aims to guide readers — in the most intuitive, step-by-step way — to "reinvent" RoPE (Rotary Position Embedding)…
萬事通，局部精通：Hugging Face 發表多功能 Transformer 代理人 JAT★ 75
Hugging Face Blog827 days agoRelease
In the field of artificial intelligence, developing a "Generalist Agent" — one capable of chatting, writing, controlling robots, and playing video games all at…
Hugging Face 整合 PatchTST：專為時間序列預測設計的 Transformer 模型★ 75
Hugging Face Blog908 days agoRelease
The official Hugging Face blog announced a major update: the integration of the PatchTST (Patch Time Series Transformer) model into its `transformers`…
介紹 RWKV：兼具 Transformer 優勢的全新 RNN 架構★ 75
Hugging Face Blog1,170 days agoRelease
Hugging Face has announced official support for RWKV (Receptive Weighted Key Value) models in its `transformers` library. RWKV is an innovative architecture…
Nyströmformer：透過 Nyström 方法以線性時間與記憶體複雜度逼近 Self-Attention
Hugging Face Blog1,456 days agoRelease
This Hugging Face blog post provides a detailed introduction to Nyströmformer, a Transformer variant designed to overcome the bottleneck of processing long…
Hugging Face 與 Habana Labs 合作加速 Transformer 模型訓練
Hugging Face Blog1,568 days agoRelease
Hugging Face and Intel's Habana Labs have officially announced a partnership aimed at providing the community with more efficient and cost-effective solutions…
BERT 101：最先進的 NLP 模型完整原理解析
Hugging Face Blog1,609 days agoTutorial
BERT (Bidirectional Encoder Representations from Transformers) is a landmark natural language processing (NLP) model proposed by Google in 2018. This Hugging…
Perceiver IO：可擴展且適用於任何模態的全注意力機制模型★ 70
Hugging Face Blog1,686 days agoRelease
This article introduces DeepMind's Perceiver IO model and its integration into the Hugging Face Transformers library. Traditional Transformer models, while…
使用區塊稀疏矩陣（Block Sparse Matrices）打造更小、更快的語言模型
Hugging Face Blog2,147 days agoTutorial
In the field of natural language processing (NLP), the Transformer architecture has become the dominant paradigm, but its core self-attention mechanism…
Reformer：挑戰語言模型長文本處理極限的架構
Hugging Face Blog2,216 days agoPaper
This technical blog post published by Hugging Face takes a deep dive into how the Reformer architecture overcomes the memory and computational bottlenecks that…