Latest in AI

Showing:vlmResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Unlocking VLM Potential on Satellite Imagery Through Fine-Tuning
Mistral AI News50 days agoTutorial
Mistral AI demonstrates how LoRA fine-tuning adapts Pixtral-12B to satellite imagery, a specialized visual domain where prompting alone is unreliable. Using the Aerial Image Dataset, the post compares a prompt-based baseline against a fine-tuned model across 30 scene classes. Accuracy rose from 0.56 to 0.91, while invalid label hallucinations dropped from 5% to 0.1%.
Google 發表 Gemma 4：專為裝置端設計的前沿多模態開放模型★ 85
Hugging Face Blog117 days agoRelease
Google and Hugging Face have jointly announced a new generation of open-weight models — "Gemma 4." This model represents a major breakthrough in on-device AI…
TII 推出全新 Falcon Perception 多模態感知模型★ 75
Hugging Face Blog118 days agoRelease
The Technology Innovation Institute (TII) of the UAE has officially announced the launch of its new "Falcon Perception" model on the Hugging Face blog. As an…
Holotron-12B：高吞吐量電腦操作（Computer Use）AI 代理模型發布★ 75
Hugging Face Blog133 days agoRelease
Hcompany has officially released a new model on Hugging Face called **Holotron-12B**, positioned as a "High Throughput Computer Use Agent." Although only the…
H Company 推出全新 Holo2-235B 模型，領跑 UI 元素定位領域★ 75
Hugging Face Blog174 days agoRelease
French AI startup H Company (formerly Holistic AI, founded by former Google DeepMind researchers) announced on the Hugging Face blog the launch of its new…
在 Replicate 上運行 Isaac 0.1：專為真實世界感知設計的輕量級具身視覺語言模型
Replicate Blog244 days agoRelease
The cloud AI model deployment and hosting platform Replicate has officially announced support for running the new lightweight vision-language model (VLM) —…
使用開源模型大幅提升你的 OCR 工作流效率★ 80
Hugging Face Blog280 days agoTutorial
Traditional OCR systems (such as Tesseract) often struggle with complex layouts, multi-column tables, handwriting, and mathematical formulas, while using…
只要三個簡單步驟，就能在 Intel CPU 上運行 VLM 視覺語言模型★ 70
Hugging Face Blog286 days agoTutorial
Visual Language Models (VLMs) combine computer vision with natural language processing, enabling complex tasks such as image captioning and visual question…
Hugging Face TRL 支援視覺語言模型 (VLM) 對齊：輕鬆實現多模態 DPO 與 ORPO 訓練★ 80
Hugging Face Blog355 days agoRelease
Hugging Face's TRL (Transformer Reinforcement Learning) is a popular open-source library specifically designed for aligning language models (LLMs). In its…
Hugging Face 推出高效多模態資料管線 (MMDP)：加速 VLM 與多模態模型訓練的資料處理利器★ 75
Hugging Face Blog385 days agoNew Tool
With the rapid development of vision-language models (VLMs) and multimodal AI, the amount of data required to train these models has grown explosively…
NVIDIA Llama Nemotron Nano VLM 正式登陸 Hugging Face Hub★ 75
Hugging Face Blog395 days agoRelease
NVIDIA has partnered with Hugging Face to officially bring its latest lightweight vision-language model (VLM) — the **NVIDIA Llama Nemotron Nano VLM** — to the…
從零開始在 nanoVLM 中實作 KV Cache★ 75
Hugging Face Blog419 days agoTutorial
In the inference process of large language models (LLMs) and vision-language models (VLMs), autoregressive decoding is a major performance bottleneck. Each…
Hcompany 推出 Holo1：全新 GUI 自動化 VLM 家族，為智慧代理人 Surfer-H 提供強大動力★ 78
Hugging Face Blog420 days agoRelease
H (formerly Holistic AI), a highly regarded French AI startup, recently officially released a new family of vision-language models (VLMs) on the Hugging Face…
nanoVLM：用純 PyTorch 訓練視覺語言模型（VLM）的最簡開源專案★ 75
Hugging Face Blog433 days agoRelease
Hugging Face recently launched an open-source project called nanoVLM, positioned as "the simplest repository for training Vision Language Models (VLMs) in pure…
Hugging Face 釋出 2025 視覺語言模型（VLM）指南：更強、更快、更實用的開源新時代★ 80
Hugging Face Blog442 days agoOpinion
With the explosion of multimodal technology, Vision Language Models (VLMs) have evolved from laboratory research prototypes into core tools for enterprises and…
介紹 AutoRound：Intel 針對 LLM 與 VLM 的先進量化技術★ 75
Hugging Face Blog455 days agoRelease
As large language models (LLMs) and vision language models (VLMs) continue to scale up, running these models on limited hardware resources — such as…
微調 olmOCR 打造高保真度 OCR 引擎★ 75
Hugging Face Blog461 days agoTutorial
### Background With the proliferation of vision-language models (VLMs), using VLMs for document OCR (e.g., converting PDFs to Markdown) has become mainstream…
Visual Salamandra 7B 發布：巴塞隆納超級電腦中心推出開源多模態大模型，主打多語言與視覺理解★ 70
Hugging Face Blog473 days agoRelease
The Language Technologies department (BSC-LT) of the Barcelona Supercomputing Center (BSC) recently released a new open-source multimodal model on Hugging Face…
深入解析 Aya Vision：推動多語言多模態 AI 的前沿發展★ 75
Hugging Face Blog511 days agoRelease
Cohere For AI (C4AI) has officially launched "Aya Vision," a series of open-source multimodal models (available in 8B and 32B parameter versions) designed…
Google 推出 SigLIP 2：更強大的多語言視覺語言編碼器★ 80
Hugging Face Blog522 days agoRelease
Google has officially launched SigLIP 2, a major upgrade to its widely popular SigLIP (Sigmoid Loss for Language-Image Pre-training) vision-language encoder…
SmolVLM2：將影片理解能力帶到每一台裝置的輕量級視覺語言模型★ 80
Hugging Face Blog523 days agoRelease
Hugging Face has introduced SmolVLM2, the latest addition to its Smol family of lightweight models. SmolVLM2 is designed to bring advanced vision-language…
Google 推出 PaliGemma 2 Mix：全新指令微調視覺語言模型★ 80
Hugging Face Blog524 days agoRelease
Google has officially launched the PaliGemma 2 Mix model series — a new family of open-source instruction-tuned vision-language models (VLMs) now available on…
Hugging Face 釋出 vid_ds_scripts：一站式構建影片生成高品質資料集★ 75
Hugging Face Blog531 days agoNew Tool
With the rise of open-source video generation models such as LTX-Video, HunyuanVideo, and CogVideoX, building high-quality training datasets has become the…
Hugging Face 輕量級 Agent 框架 smolagents 正式支援視覺語言模型 (VLM)！★ 80
Hugging Face Blog550 days agoRelease
On January 24, 2025, Hugging Face announced that smolagents — its open-source library designed for building lightweight, high-performance AI agents — now…
Hugging Face 推出更輕量 SmolVLM：全新 256M 與 500M 超小視覺語言模型登場！★ 75
Hugging Face Blog551 days agoRelease
Hugging Face has officially introduced the newest members of the SmolVLM family, pushing vision-language model (VLM) sizes even further down to 256M (256…
視覺文件檢索邁向多語言：Hugging Face 推出 VDR-2B-multilingual 模型★ 80
Hugging Face Blog564 days agoRelease
Hugging Face has recently released a new Visual Document Retrieval (VDR) model — **VDR-2B-multilingual**. This technology marks a formal transition in document…
Google 推出全新視覺語言模型 PaliGemma 2：基於 Gemma 2 的多模態輕量級模型★ 80
Hugging Face Blog600 days agoRelease
Google and Hugging Face have jointly announced the release of a new generation of open-weight vision-language model (VLM) — PaliGemma 2. This model continues…
Hugging Face 推出 SmolVLM：輕量且強大的開源視覺語言模型，可在本機高效運行★ 80
Hugging Face Blog609 days agoRelease
Hugging Face has officially launched a lightweight vision language model (VLM) called **SmolVLM**, designed to bring powerful multimodal understanding…
Hugging Face 推出 Docmatix：用於文件視覺問答（DocVQA）的超大型開源數據集★ 75
Hugging Face Blog740 days agoRelease
The Hugging Face official blog has announced the release of a new, massive dataset called "Docmatix," specifically designed for training and fine-tuning…
視覺語言模型（VLM）的偏好最佳化指南：使用 TRL 進行 DPO 微調★ 75
Hugging Face Blog748 days agoTutorial
As vision-language models (VLMs) are increasingly applied to multimodal tasks, how to make these models produce outputs that better align with human…

Page 1Next →

Latest in AI

Unlocking VLM Potential on Satellite Imagery Through Fine-Tuning

Google 發表 Gemma 4：專為裝置端設計的前沿多模態開放模型★ 85

TII 推出全新 Falcon Perception 多模態感知模型★ 75

Holotron-12B：高吞吐量電腦操作（Computer Use）AI 代理模型發布★ 75

H Company 推出全新 Holo2-235B 模型，領跑 UI 元素定位領域★ 75

在 Replicate 上運行 Isaac 0.1：專為真實世界感知設計的輕量級具身視覺語言模型

使用開源模型大幅提升你的 OCR 工作流效率★ 80

只要三個簡單步驟，就能在 Intel CPU 上運行 VLM 視覺語言模型★ 70

Hugging Face TRL 支援視覺語言模型 (VLM) 對齊：輕鬆實現多模態 DPO 與 ORPO 訓練★ 80

Hugging Face 推出高效多模態資料管線 (MMDP)：加速 VLM 與多模態模型訓練的資料處理利器★ 75

NVIDIA Llama Nemotron Nano VLM 正式登陸 Hugging Face Hub★ 75

從零開始在 nanoVLM 中實作 KV Cache★ 75

Hcompany 推出 Holo1：全新 GUI 自動化 VLM 家族，為智慧代理人 Surfer-H 提供強大動力★ 78

nanoVLM：用純 PyTorch 訓練視覺語言模型（VLM）的最簡開源專案★ 75

Hugging Face 釋出 2025 視覺語言模型（VLM）指南：更強、更快、更實用的開源新時代★ 80

介紹 AutoRound：Intel 針對 LLM 與 VLM 的先進量化技術★ 75

微調 olmOCR 打造高保真度 OCR 引擎★ 75

Visual Salamandra 7B 發布：巴塞隆納超級電腦中心推出開源多模態大模型，主打多語言與視覺理解★ 70

深入解析 Aya Vision：推動多語言多模態 AI 的前沿發展★ 75

Google 推出 SigLIP 2：更強大的多語言視覺語言編碼器★ 80

SmolVLM2：將影片理解能力帶到每一台裝置的輕量級視覺語言模型★ 80

Google 推出 PaliGemma 2 Mix：全新指令微調視覺語言模型★ 80

Hugging Face 釋出 vid_ds_scripts：一站式構建影片生成高品質資料集★ 75

Hugging Face 輕量級 Agent 框架 smolagents 正式支援視覺語言模型 (VLM)！★ 80

Hugging Face 推出更輕量 SmolVLM：全新 256M 與 500M 超小視覺語言模型登場！★ 75

視覺文件檢索邁向多語言：Hugging Face 推出 VDR-2B-multilingual 模型★ 80

Google 推出全新視覺語言模型 PaliGemma 2：基於 Gemma 2 的多模態輕量級模型★ 80

Hugging Face 推出 SmolVLM：輕量且強大的開源視覺語言模型，可在本機高效運行★ 80

Hugging Face 推出 Docmatix：用於文件視覺問答（DocVQA）的超大型開源數據集★ 75

視覺語言模型（VLM）的偏好最佳化指南：使用 TRL 進行 DPO 微調★ 75