Latest in AI

Showing:deduplicationResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Hugging Face 推出 Parquet 內容定義分塊 (CDC)：優化大規模 AI 資料集去重與傳輸效率★ 75
Hugging Face Blog368 days agoRelease
### What Is Parquet Content-Defined Chunking (CDC)? In the AI and machine learning field, dataset sizes are growing at a staggering pace. Datasets on the…
Hugging Face 儲存架構演進：從檔案到分塊（Chunks）提升儲存效率★ 75
Hugging Face Blog615 days agoRelease
The Hugging Face Hub currently hosts millions of AI models, datasets, and applications (Spaces), with total storage reaching the hundreds of petabytes. As the…
提升 Hugging Face Hub 上的 Parquet 去重（Deduplication）效率
Hugging Face Blog661 days agoRelease
The Hugging Face Hub, as the world's largest open-source AI community and dataset hosting platform, automatically converts datasets uploaded in various formats…
BigCode 背後的大規模近乎重複資料刪除技術★ 75
Hugging Face Blog1,169 days agoTutorial
This technical blog post from Hugging Face takes an in-depth look at the challenges the BigCode project (the collaborative initiative behind StarCoder) faced…