Latest in AI

Showing:transformerStudentsClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Tiny hackable CUDA language model implementation
Hacker News (AI keywords)52 days agoNew Tool
This GitHub project implements a compact generative pretrained transformer as an autoregressive byte-level sequence model. Its README describes causal self-attention, RoPE, feed-forward layers, AdamW, cross-entropy training, and BLAS/OpenBLAS-backed matrix operations, with CUDA toolkit listed in setup steps. It is most useful as an educational and experimental codebase, not as a production-grade replacement for large commercial LLMs.
從零開始在 nanoVLM 中實作 KV Cache★ 75
Hugging Face Blog419 days agoTutorial
In the inference process of large language models (LLMs) and vision-language models (VLMs), autoregressive decoding is a major performance bottleneck. Each…
你也能設計出最先進的 Transformer 位置編碼：從直覺到 RoPE 的數學推導★ 75
Hugging Face Blog610 days agoTutorial
This educational article from Hugging Face aims to guide readers — in the most intuitive, step-by-step way — to "reinvent" RoPE (Rotary Position Embedding)…
BERT 101：最先進的 NLP 模型完整原理解析
Hugging Face Blog1,609 days agoTutorial
BERT (Bidirectional Encoder Representations from Transformers) is a landmark natural language processing (NLP) model proposed by Google in 2018. This Hugging…