Latest in AI

Showing:synthetic-dataResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Decart’s new world model can simulate hours of photorealistic driving
TechCrunch AI48 days agoNew Tool
Decart is launching Oasis 3, a real-time world model designed to generate photorealistic driving environments for autonomous vehicle testing. The headline says it can simulate hours of driving, while also noting there are caveats. The model is now available through an API, giving developers a way to build applications or testing workflows on top of it.
Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
Hugging Face Blog54 days agoTutorial
The post appears to focus on generating synthetic Q&A data from task seeds for Nemotron pretraining. Rather than a model launch, it likely emphasizes data generation and pretraining corpus design. Because the original article text is unavailable here, concrete claims about dataset scale, benchmarks, or implementation details should not be inferred.
蒸餾恐慌：為什麼將「知識蒸餾」稱為安全攻擊是極其糟糕的趨勢★ 75
Interconnects (Nathan L.)84 days agoOpinion
In the field of machine learning, "knowledge distillation" is a well-established technique that generally refers to using the output data generated by a…
損耗性自我提升：為什麼 AI 自我改進是真的，但不會導致「急遽暴漲」★ 75
Interconnects (Nathan L.)127 days agoOpinion
This article takes a deep dive into one of the most contentious topics in artificial intelligence: AI "self-improvement" and whether it will trigger a "fast…
一天內構建專屬領域的 Embedding 模型：Hugging Face 與 NVIDIA 實戰指南★ 80
Hugging Face Blog129 days agoTutorial
When building Retrieval-Augmented Generation (RAG) systems, general-purpose embedding models (such as those from OpenAI or common open-source alternatives)…
ImportAI 449：LLM 訓練 LLM、72B 分散式訓練、為什麼電腦視覺比文本生成更難？以及 AI 是否會引發政治過渡期？★ 75
Import AI (Jack Clark)134 days agoCommentary
This issue of Import AI (No. 449) dives deep into several core frontier topics in the current AI landscape, spanning technical breakthroughs and broad…
我們讓 Claude 撰寫 CUDA 核心並教導開源模型！Hugging Face 發表 Upskill 專案★ 80
Hugging Face Blog181 days agoRelease
### Background and Challenge: Why Is CUDA Programming So Hard for AI? CUDA (Compute Unified Device Architecture) is a parallel computing platform and…
Nemotron-Personas-India：為主權 AI 打造的印度在地化合成數據集★ 75
Hugging Face Blog287 days agoRelease
As "Sovereign AI" becomes a global trend, countries around the world are actively seeking to build AI models that reflect their own culture, values, and…
NVIDIA 推出 Nemotron-Personas-Japan：專為日本主權 AI 打造的合成數據集★ 75
Hugging Face Blog305 days agoRelease
NVIDIA has released a new synthetic dataset on Hugging Face called "Nemotron-Personas-Japan," a critical resource designed specifically to advance Japan's…
ServiceNow AI 推出 SyGra：為 LLM 與 SLM 打造的一站式合成資料生成框架★ 75
Hugging Face Blog309 days agoRelease
ServiceNow AI recently published a post on the Hugging Face blog introducing a brand-new open-source framework called "SyGra" — a one-stop synthetic data…
Hugging Face 推出 Synthetic Data Generator：用自然語言輕鬆構建 AI 訓練資料集★ 82
Hugging Face Blog589 days agoNew Tool
Hugging Face launched a brand-new "Synthetic Data Generator" in December 2024 — a web-based, no-code tool designed to allow anyone to create high-quality AI…
如何利用 distilabel 打造 Argilla 2.0 專屬聊天機器人★ 75
Hugging Face Blog742 days agoTutorial
In the AI field, quickly building a chatbot that can accurately answer questions about a specific domain or newly released software has always been a major…
Replicate Intelligence #7：資料整理與資料生成的重要性
Replicate Blog746 days agoCommentary
In the current wave of generative AI, the industry's attention is gradually shifting from "fine-tuning model architectures" to "improving data quality." Issue…
Cosmopedia：如何為大型語言模型預訓練建立大規模合成數據★ 85
Hugging Face Blog860 days agoRelease
Hugging Face has officially released Cosmopedia, currently the largest and fully open-source synthetic dataset designed for the pre-training of large language…
開源合成數據：如何幫你省錢、省時並減少碳排放★ 75
Hugging Face Blog893 days agoOpinion
This article takes an in-depth look at the critical role of "synthetic data" in the open-source ecosystem, and explains how it helps enterprises and developers…
無需真實數據的高效表格預訓練：TAPEX 概念與 Hugging Face 整合介紹
Hugging Face Blog1,527 days agoRelease
When working with structured data such as tables, traditional pre-trained models typically require crawling large amounts of real-world tables and related text…

Latest in AI

Decart’s new world model can simulate hours of photorealistic driving

Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining

蒸餾恐慌：為什麼將「知識蒸餾」稱為安全攻擊是極其糟糕的趨勢★ 75

損耗性自我提升：為什麼 AI 自我改進是真的，但不會導致「急遽暴漲」★ 75

一天內構建專屬領域的 Embedding 模型：Hugging Face 與 NVIDIA 實戰指南★ 80

ImportAI 449：LLM 訓練 LLM、72B 分散式訓練、為什麼電腦視覺比文本生成更難？以及 AI 是否會引發政治過渡期？★ 75

我們讓 Claude 撰寫 CUDA 核心並教導開源模型！Hugging Face 發表 Upskill 專案★ 80

Nemotron-Personas-India：為主權 AI 打造的印度在地化合成數據集★ 75

NVIDIA 推出 Nemotron-Personas-Japan：專為日本主權 AI 打造的合成數據集★ 75

ServiceNow AI 推出 SyGra：為 LLM 與 SLM 打造的一站式合成資料生成框架★ 75

Hugging Face 推出 Synthetic Data Generator：用自然語言輕鬆構建 AI 訓練資料集★ 82

如何利用 distilabel 打造 Argilla 2.0 專屬聊天機器人★ 75

Replicate Intelligence #7：資料整理與資料生成的重要性

Cosmopedia：如何為大型語言模型預訓練建立大規模合成數據★ 85

開源合成數據：如何幫你省錢、省時並減少碳排放★ 75

無需真實數據的高效表格預訓練：TAPEX 概念與 Hugging Face 整合介紹