Latest in AI

Showing:datasetResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios
Hugging Face Blog54 days agoBenchmark
ServiceNow AI published a Hugging Face Blog post titled “EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios.” Based only on the title, it appears to be a benchmark dataset update involving tool-use or scenario-based AI evaluation. The exact domains, tools, scenario design, licensing, supported models, and evaluation methodology cannot be confirmed without the full article.
Nemotron-Personas-India：為主權 AI 打造的印度在地化合成數據集★ 75
Hugging Face Blog287 days agoRelease
As "Sovereign AI" becomes a global trend, countries around the world are actively seeking to build AI models that reflect their own culture, values, and…
NVIDIA 推出 Nemotron-Personas-Japan：專為日本主權 AI 打造的合成數據集★ 75
Hugging Face Blog305 days agoRelease
NVIDIA has released a new synthetic dataset on Hugging Face called "Nemotron-Personas-Japan," a critical resource designed specifically to advance Japan's…
SandboxAQ 推出 SAIR 數據集：以 AI 結構智能加速製藥研發★ 72
Hugging Face Blog328 days agoRelease
SandboxAQ — an AI and quantum technology pioneer spun out of Alphabet — has officially launched an open-source dataset called SAIR (Structural AI for Research)…
NVIDIA 於 Hugging Face 開源發布 600 萬筆多語言推理數據集★ 78
Hugging Face Blog341 days agoRelease
NVIDIA has officially released a massive "Multi-Lingual Reasoning Dataset" containing 6 million samples on the Hugging Face platform. This significant…
Arc Virtual Cell Challenge 導讀：用 AI 模擬細胞生物學的全新挑戰★ 75
Hugging Face Blog375 days agoRelease
The "Virtual Cell" is one of the ultimate goals at the intersection of systems biology and artificial intelligence, aiming to fully simulate the physiological…
LeRobot 社群資料集：機器人領域的「ImageNet」何時到來？如何實現？★ 80
Hugging Face Blog443 days agoOpinion
In the history of artificial intelligence, the appearance of the ImageNet dataset in 2012 is widely recognized as the key catalyst that ignited the deep…
Hugging Face 推出 LeRobot 自動駕駛資料集：全球最大開源自動駕駛資料庫正式上線★ 85
Hugging Face Blog504 days agoRelease
### Hugging Face LeRobot Enters New Territory: Launches the World's Largest Open-Source Autonomous Driving Dataset Hugging Face's open-source robotics project…
Hugging Face 釋出 vid_ds_scripts：一站式構建影片生成高品質資料集★ 75
Hugging Face Blog531 days agoNew Tool
With the rise of open-source video generation models such as LTX-Video, HunyuanVideo, and CogVideoX, building high-quality training datasets has become the…
Hugging Face 推出 Synthetic Data Generator：用自然語言輕鬆構建 AI 訓練資料集★ 82
Hugging Face Blog589 days agoNew Tool
Hugging Face launched a brand-new "Synthetic Data Generator" in December 2024 — a web-based, no-code tool designed to allow anyone to create high-quality AI…
Hugging Face 社群推出用於文字生成圖像的開源偏好資料集 (Open Preference Dataset)★ 75
Hugging Face Blog596 days agoRelease
### Introduction: An Important Piece of the Open-Source Image Generation Puzzle As text-to-image (T2I) technology advances rapidly, ensuring that AI-generated…
Argilla 2.4 發布：在 Hugging Face Hub 上免程式碼輕鬆構建微調與評估數據集★ 75
Hugging Face Blog631 days agoRelease
The open-source data curation and annotation platform Argilla has officially released version 2.4, with the core of this update being deep integration with…
CinePile 2.0：利用對抗性精煉打造更強大的長影片問答資料集★ 75
Hugging Face Blog643 days agoRelease
CinePile is a multimodal question-answering dataset focused on movie and long-video understanding. In traditional dataset construction, researchers commonly…
🇨🇿 BenCzechMark：你的 LLM 能聽懂捷克語嗎？全新捷克語基準測試發布
Hugging Face Blog665 days agoRelease
The Hugging Face team and its collaborators have jointly launched a new benchmark called "BenCzechMark," designed to evaluate the understanding and generation…
FineVideo 幕後秘辛：Hugging Face 如何打造高品質開源影片資料集★ 75
Hugging Face Blog673 days agoRelease
With the explosion of video generation and understanding models such as Sora and Gen-3, high-quality video training data has become a key battleground for…
Hugging Face 推出 Docmatix：用於文件視覺問答（DocVQA）的超大型開源數據集★ 75
Hugging Face Blog740 days agoRelease
The Hugging Face official blog has announced the release of a new, massive dataset called "Docmatix," specifically designed for training and fine-tuning…
Replicate Intelligence #7：資料整理與資料生成的重要性
Replicate Blog746 days agoCommentary
In the current wave of generative AI, the industry's attention is gradually shifting from "fine-tuning model architectures" to "improving data quality." Issue…
Hugging Face「Data Is Better Together」社群數據協作計劃：回顧與展望
Hugging Face Blog768 days agoRelease
### Background In the current development of large language models (LLMs), high-quality alignment data (such as the preference data required for RLHF and DPO)…
Replicate Intelligence #2：更快的圖像生成、AI 驅動的世界模擬器與 AI 資料集複雜度洞察★ 75
Replicate Blog788 days agoCommentary
Replicate's technical newsletter, Replicate Intelligence #2, takes a deep dive into three of the most hotly discussed trends in the open-source AI community…
StarCoder2-Instruct：完全透明且具備寬鬆授權的程式碼生成自我對齊技術★ 75
Hugging Face Blog820 days agoRelease
### Background and Challenges In the field of code generation, instruction tuning is the key to improving a model's practical utility and alignment with human…
Cosmopedia：如何為大型語言模型預訓練建立大規模合成數據★ 85
Hugging Face Blog860 days agoRelease
Hugging Face has officially released Cosmopedia, currently the largest and fully open-source synthetic dataset designed for the pre-training of large language…
Hugging Face 推出 WebSight 數據集：解鎖網頁截圖直接轉換為 HTML 程式碼的能力★ 75
Hugging Face Blog865 days agoRelease
The Hugging Face official blog has published a post introducing WebSight, a brand-new open-source dataset designed to address the bottleneck that multimodal…
StarCoder2 與 The Stack v2 正式發布：新一代開源程式碼大模型與超大資料集★ 80
Hugging Face Blog881 days agoRelease
The BigCode community, jointly led by Hugging Face and ServiceNow, together with NVIDIA, has officially announced the launch of a new generation of open-source…
介紹 Prodigy-HF：與 Hugging Face 的直接整合
Hugging Face Blog994 days agoNew Tool
Prodigy, the well-known machine learning data annotation tool from Explosion (the company behind the popular NLP library spaCy), has officially released a…
只需一行程式碼，即可互動式探索與檢視 Hugging Face 數據集
Hugging Face Blog1,007 days agoNew Tool
This article introduces the integration between Hugging Face and the open-source data exploration tool Renumics Spotlight, aimed at addressing the pain point…
Hugging Face 推出 IDEFICS：開源重現 SOTA 多模態視覺語言模型 Flamingo★ 78
Hugging Face Blog1,071 days agoRelease
Hugging Face has officially launched IDEFICS (Image-supervised Decoder-Encoder-Few-shot-In-Context-Shorthand), an open-source multimodal vision-language model…
Hugging Face 倫理與社會電子報 #4：文字生成圖像模型中的偏見問題
Hugging Face Blog1,128 days agoCommentary
The Hugging Face Ethics and Society team has published the fourth edition of its newsletter, this time focusing on the problem of "bias" in text-to-image (T2I)…
聊聊機器學習中的偏見！Hugging Face 倫理與社會電子報第二期
Hugging Face Blog1,321 days agoOpinion
This second issue of the newsletter from Hugging Face's Ethics and Society team centers on the theme of "Biases in Machine Learning." As AI technology becomes…
自動化影像收集：利用 CLIP 與 LAION-5B 獲取成千上萬張帶標籤的圖片
Replicate Blog1,453 days agoTutorial
In the fields of artificial intelligence and computer vision, collecting high-quality, labeled image datasets is typically a time-consuming and tedious task…

Latest in AI

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios

Nemotron-Personas-India：為主權 AI 打造的印度在地化合成數據集★ 75

NVIDIA 推出 Nemotron-Personas-Japan：專為日本主權 AI 打造的合成數據集★ 75

SandboxAQ 推出 SAIR 數據集：以 AI 結構智能加速製藥研發★ 72

NVIDIA 於 Hugging Face 開源發布 600 萬筆多語言推理數據集★ 78

Arc Virtual Cell Challenge 導讀：用 AI 模擬細胞生物學的全新挑戰★ 75

LeRobot 社群資料集：機器人領域的「ImageNet」何時到來？如何實現？★ 80

Hugging Face 推出 LeRobot 自動駕駛資料集：全球最大開源自動駕駛資料庫正式上線★ 85

Hugging Face 釋出 vid_ds_scripts：一站式構建影片生成高品質資料集★ 75

Hugging Face 推出 Synthetic Data Generator：用自然語言輕鬆構建 AI 訓練資料集★ 82

Hugging Face 社群推出用於文字生成圖像的開源偏好資料集 (Open Preference Dataset)★ 75

Argilla 2.4 發布：在 Hugging Face Hub 上免程式碼輕鬆構建微調與評估數據集★ 75

CinePile 2.0：利用對抗性精煉打造更強大的長影片問答資料集★ 75

🇨🇿 BenCzechMark：你的 LLM 能聽懂捷克語嗎？全新捷克語基準測試發布

FineVideo 幕後秘辛：Hugging Face 如何打造高品質開源影片資料集★ 75

Hugging Face 推出 Docmatix：用於文件視覺問答（DocVQA）的超大型開源數據集★ 75

Replicate Intelligence #7：資料整理與資料生成的重要性

Hugging Face「Data Is Better Together」社群數據協作計劃：回顧與展望

Replicate Intelligence #2：更快的圖像生成、AI 驅動的世界模擬器與 AI 資料集複雜度洞察★ 75

StarCoder2-Instruct：完全透明且具備寬鬆授權的程式碼生成自我對齊技術★ 75

Cosmopedia：如何為大型語言模型預訓練建立大規模合成數據★ 85

Hugging Face 推出 WebSight 數據集：解鎖網頁截圖直接轉換為 HTML 程式碼的能力★ 75

StarCoder2 與 The Stack v2 正式發布：新一代開源程式碼大模型與超大資料集★ 80

介紹 Prodigy-HF：與 Hugging Face 的直接整合

只需一行程式碼，即可互動式探索與檢視 Hugging Face 數據集

Hugging Face 推出 IDEFICS：開源重現 SOTA 多模態視覺語言模型 Flamingo★ 78

Hugging Face 倫理與社會電子報 #4：文字生成圖像模型中的偏見問題

聊聊機器學習中的偏見！Hugging Face 倫理與社會電子報第二期

自動化影像收集：利用 CLIP 與 LAION-5B 獲取成千上萬張帶標籤的圖片