Latest in AI

Showing:multimodalResearchersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Google DeepMind 推出全新改進版 Gemini 音訊模型，打造更強大的語音互動體驗★ 85
Google DeepMind Blog227 days agoRelease
Google DeepMind has announced a major upgrade to its Gemini audio models, aimed at delivering a more natural, fluid, and low-latency voice interaction…
Google DeepMind 發表 Gemini 3 Pro 圖像模型「Nano Banana Pro」：開啟下一代視覺生成與構建★ 78
Google DeepMind Blog250 days agoRelease
Google DeepMind has unveiled a new model called "Nano Banana Pro," which is also the Pro-tier image model of the Gemini 3 generation (Gemini 3 Pro Image…
開始使用 Gemini 3 進行開發：Google DeepMind 全新一代模型正式登場★ 95
Google DeepMind Blog251 days agoRelease
Google DeepMind today officially unveiled its latest generation AI model family — Gemini 3 — and extended an invitation to developers worldwide, formally…
Google DeepMind 發表全新一代 Gemini 3：開啟主動式 AI 與超強推理的全新智能時代★ 98
Google DeepMind Blog251 days agoRelease
Google DeepMind officially unveiled its latest flagship AI model — Gemini 3 — in November 2025. This marks a new milestone for Google in the field of…
Google DeepMind 推出 SIMA 2：由 Gemini 驅動、能在 3D 虛擬世界中與你一同遊玩、推理與學習的 AI 代理★ 85
Google DeepMind Blog257 days agoRelease
Google DeepMind has officially introduced SIMA 2 (Scalable Instructable Multiworld Agent 2). Compared to its predecessor, the most significant transformation…
Google DeepMind 推出 MedGemma：用於醫療 AI 開發的最強大開源多模態模型★ 82
Google DeepMind Blog275 days agoRelease
Google DeepMind officially announced the launch of a new "MedGemma" multimodal model within its open-source medical model series. This model represents the…
Gemini 2.5 Flash-Lite 正式推出穩定版，支援大規模生產環境部署★ 75
Google DeepMind Blog275 days agoRelease
Google DeepMind today announced that Gemini 2.5 Flash-Lite — its lightweight AI model that had previously been in preview — has officially transitioned to a…
Google DeepMind 推出實驗性 AI 工具「Backstory」，幫助使用者探索網路圖片的背景與來源★ 75
Google DeepMind Blog277 days agoNew Tool
Google DeepMind recently unveiled a new experimental AI tool called "Backstory," designed to help internet users deeply explore and understand the background…
Gemini Robotics 1.5 發表：將 AI Agent 帶入實體世界，賦能機器人感知、規劃與工具操作★ 85
Google DeepMind Blog277 days agoRelease
Google DeepMind has officially announced the launch of Gemini Robotics 1.5, marking the formal entry of AI Agent technology into the physical world and…
Google 推出 Gemini 2.5 Computer Use 模型：基於 Gemini 2.5 Pro，支援 API 預覽★ 85
Google DeepMind Blog277 days agoRelease
Google DeepMind has officially launched the new dedicated "Gemini 2.5 Computer Use" model, which is now available in preview via API. This model is built on…
Hugging Face TRL 支援視覺語言模型 (VLM) 對齊：輕鬆實現多模態 DPO 與 ORPO 訓練★ 80
Hugging Face Blog355 days agoRelease
Hugging Face's TRL (Transformer Reinforcement Learning) is a popular open-source library specifically designed for aligning language models (LLMs). In its…
TimeScope：評估影片大型多模態模型（Video LMM）長影片理解極限的新基準★ 75
Hugging Face Blog370 days agoRelease
As large multimodal models (LMMs) have achieved breakthroughs in image and short-video understanding, the industry has gradually shifted its attention to the…
Hugging Face 推出高效多模態資料管線 (MMDP)：加速 VLM 與多模態模型訓練的資料處理利器★ 75
Hugging Face Blog385 days agoNew Tool
With the rapid development of vision-language models (VLMs) and multimodal AI, the amount of data required to train these models has grown explosively…
NVIDIA Llama Nemotron Nano VLM 正式登陸 Hugging Face Hub★ 75
Hugging Face Blog395 days agoRelease
NVIDIA has partnered with Hugging Face to officially bring its latest lightweight vision-language model (VLM) — the **NVIDIA Llama Nemotron Nano VLM** — to the…
Gemini 2.5 迎來全新突破：先進的語音對話與音訊生成功能★ 85
Google DeepMind Blog419 days agoRelease
Google DeepMind has announced that its latest-generation model, Gemini 2.5, has achieved new breakthroughs in AI-driven audio dialog and audio generation. This…
nanoVLM：用純 PyTorch 訓練視覺語言模型（VLM）的最簡開源專案★ 75
Hugging Face Blog433 days agoRelease
Hugging Face recently launched an open-source project called nanoVLM, positioned as "the simplest repository for training Vision Language Models (VLMs) in pure…
Google 發表 Gemma 3n 預覽版：強大、高效且行動優先的端側多模態 AI 模型★ 78
Google DeepMind Blog434 days agoRelease
Google DeepMind has officially released a preview of its new open model "Gemma 3n." This is a cutting-edge open model purpose-built for mobile devices and…
Hugging Face 釋出 2025 視覺語言模型（VLM）指南：更強、更快、更實用的開源新時代★ 80
Hugging Face Blog442 days agoOpinion
With the explosion of multimodal technology, Vision Language Models (VLMs) have evolved from laboratory research prototypes into core tools for enterprises and…
Visual Salamandra 7B 發布：巴塞隆納超級電腦中心推出開源多模態大模型，主打多語言與視覺理解★ 70
Hugging Face Blog473 days agoRelease
The Language Technologies department (BSC-LT) of the Barcelona Supercomputing Center (BSC) recently released a new open-source multimodal model on Hugging Face…
Google 推出全新 Gemma 3：支援多模態、多語言與長文本的開源大語言模型★ 90
Hugging Face Blog503 days agoRelease
Google has officially launched Gemma 3, the next generation of its open-source large language model series — a major technical leap forward from Gemma 2. Gemma…
深入解析 Aya Vision：推動多語言多模態 AI 的前沿發展★ 75
Hugging Face Blog511 days agoRelease
Cohere For AI (C4AI) has officially launched "Aya Vision," a series of open-source multimodal models (available in 8B and 32B parameter versions) designed…
Google 推出 PaliGemma 2 Mix：全新指令微調視覺語言模型★ 80
Hugging Face Blog524 days agoRelease
Google has officially launched the PaliGemma 2 Mix model series — a new family of open-source instruction-tuned vision-language models (VLMs) now available on…
Hugging Face 輕量級 Agent 框架 smolagents 正式支援視覺語言模型 (VLM)！★ 80
Hugging Face Blog550 days agoRelease
On January 24, 2025, Hugging Face announced that smolagents — its open-source library designed for building lightweight, high-performance AI agents — now…
Hugging Face 推出更輕量 SmolVLM：全新 256M 與 500M 超小視覺語言模型登場！★ 75
Hugging Face Blog551 days agoRelease
Hugging Face has officially introduced the newest members of the SmolVLM family, pushing vision-language model (VLM) sizes even further down to 256M (256…
評估音訊推理能力：Hugging Face 推出 Big Bench Audio 基準測試★ 75
Hugging Face Blog585 days agoRelease
As multimodal large language models (such as GPT-4o, Gemini, and various open-source audio models) continue to proliferate, AI's ability to process audio has…
Google 推出全新視覺語言模型 PaliGemma 2：基於 Gemma 2 的多模態輕量級模型★ 80
Hugging Face Blog600 days agoRelease
Google and Hugging Face have jointly announced the release of a new generation of open-weight vision-language model (VLM) — PaliGemma 2. This model continues…
Hugging Face 推出 SmolVLM：輕量且強大的開源視覺語言模型，可在本機高效運行★ 80
Hugging Face Blog609 days agoRelease
Hugging Face has officially launched a lightweight vision language model (VLM) called **SmolVLM**, designed to bring powerful multimodal understanding…
CinePile 2.0：利用對抗性精煉打造更強大的長影片問答資料集★ 75
Hugging Face Blog643 days agoRelease
CinePile is a multimodal question-answering dataset focused on movie and long-video understanding. In traditional dataset construction, researchers commonly…
Meta 推出 Llama 3.2：支援視覺多模態與邊緣裝置運行的輕量級模型，Hugging Face 全面支援★ 95
Hugging Face Blog671 days agoRelease
Meta has officially introduced the Llama 3.2 family of open-source models, marking a significant architectural upgrade with two major breakthroughs: multimodal…
FineVideo 幕後秘辛：Hugging Face 如何打造高品質開源影片資料集★ 75
Hugging Face Blog673 days agoRelease
With the explosion of video generation and understanding models such as Sora and Gen-3, high-quality video training data has become a key battleground for…

← PreviousPage 2Next →

Latest in AI

Google DeepMind 推出全新改進版 Gemini 音訊模型，打造更強大的語音互動體驗★ 85

Google DeepMind 發表 Gemini 3 Pro 圖像模型「Nano Banana Pro」：開啟下一代視覺生成與構建★ 78

開始使用 Gemini 3 進行開發：Google DeepMind 全新一代模型正式登場★ 95

Google DeepMind 發表全新一代 Gemini 3：開啟主動式 AI 與超強推理的全新智能時代★ 98

Google DeepMind 推出 SIMA 2：由 Gemini 驅動、能在 3D 虛擬世界中與你一同遊玩、推理與學習的 AI 代理★ 85

Google DeepMind 推出 MedGemma：用於醫療 AI 開發的最強大開源多模態模型★ 82

Gemini 2.5 Flash-Lite 正式推出穩定版，支援大規模生產環境部署★ 75

Google DeepMind 推出實驗性 AI 工具「Backstory」，幫助使用者探索網路圖片的背景與來源★ 75

Gemini Robotics 1.5 發表：將 AI Agent 帶入實體世界，賦能機器人感知、規劃與工具操作★ 85

Google 推出 Gemini 2.5 Computer Use 模型：基於 Gemini 2.5 Pro，支援 API 預覽★ 85

Hugging Face TRL 支援視覺語言模型 (VLM) 對齊：輕鬆實現多模態 DPO 與 ORPO 訓練★ 80

TimeScope：評估影片大型多模態模型（Video LMM）長影片理解極限的新基準★ 75

Hugging Face 推出高效多模態資料管線 (MMDP)：加速 VLM 與多模態模型訓練的資料處理利器★ 75

NVIDIA Llama Nemotron Nano VLM 正式登陸 Hugging Face Hub★ 75

Gemini 2.5 迎來全新突破：先進的語音對話與音訊生成功能★ 85

nanoVLM：用純 PyTorch 訓練視覺語言模型（VLM）的最簡開源專案★ 75

Google 發表 Gemma 3n 預覽版：強大、高效且行動優先的端側多模態 AI 模型★ 78

Hugging Face 釋出 2025 視覺語言模型（VLM）指南：更強、更快、更實用的開源新時代★ 80

Visual Salamandra 7B 發布：巴塞隆納超級電腦中心推出開源多模態大模型，主打多語言與視覺理解★ 70

Google 推出全新 Gemma 3：支援多模態、多語言與長文本的開源大語言模型★ 90

深入解析 Aya Vision：推動多語言多模態 AI 的前沿發展★ 75

Google 推出 PaliGemma 2 Mix：全新指令微調視覺語言模型★ 80

Hugging Face 輕量級 Agent 框架 smolagents 正式支援視覺語言模型 (VLM)！★ 80

Hugging Face 推出更輕量 SmolVLM：全新 256M 與 500M 超小視覺語言模型登場！★ 75

評估音訊推理能力：Hugging Face 推出 Big Bench Audio 基準測試★ 75

Google 推出全新視覺語言模型 PaliGemma 2：基於 Gemma 2 的多模態輕量級模型★ 80

Hugging Face 推出 SmolVLM：輕量且強大的開源視覺語言模型，可在本機高效運行★ 80

CinePile 2.0：利用對抗性精煉打造更強大的長影片問答資料集★ 75

Meta 推出 Llama 3.2：支援視覺多模態與邊緣裝置運行的輕量級模型，Hugging Face 全面支援★ 95

FineVideo 幕後秘辛：Hugging Face 如何打造高品質開源影片資料集★ 75