Latest in AI

Showing:efficiencyClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Startup Subquadratic Claims It Broke Through a Core LLM Mathematical Bottleneck
MIT Tech Review AI39 days agoBusiness
Miami-based AI startup Subquadratic emerged from stealth last month claiming to have solved a fundamental mathematical bottleneck that has held back large language models for nearly a decade. Initial reactions from the research community were skeptical, with technical details appearing sparse relative to the scale of the claim. The company has since begun releasing more substantive supporting evidence, and MIT Technology Review investigates whether it can back up its assertions.
The $1,500-Trained HRM Model Backed by HuggingFace CEO and Bengio's Team
量子位 QbitAI44 days agoPaper
A newly surfaced HRM model trained at the strikingly low cost of $1,500 has gone viral in AI circles after drawing strong recommendations from HuggingFace CEO Clem Delangue and backing from a team affiliated with Turing Award laureate Yoshua Bengio. The story underscores a growing industry fascination with cost-efficient AI training. Its rapid spread signals that the community sees it as evidence that meaningful model development no longer requires million-dollar compute budgets.
Falcon-H1：重新定義效率與性能的混合頭（Hybrid-Head）語言模型系列★ 75
Hugging Face Blog433 days agoRelease
The Technology Innovation Institute (TII) of the UAE recently officially unveiled a brand-new open-source language model series on the Hugging Face blog —…
使用 Sentence Transformers 訓練快 400 倍的靜態嵌入模型 (Static Embedding Models)★ 75
Hugging Face Blog559 days agoRelease
### What Are Static Embeddings? In today's NLP landscape, Transformer-based embedding models (such as BERT and mE5) have become the mainstream, as they…
透過 Flash Attention 2 的 Packing 技術提升 Hugging Face 訓練效率★ 80
Hugging Face Blog706 days agoTutorial
When fine-tuning or pre-training large language models (LLMs), the sequence lengths of input data are typically uneven. The traditional approach is to use…