Latest in AI

Showing:multimodalGeneralClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support
r/LocalLLaMA top day47 days agoRelease
Lemonade v10.7 marks a project-level shift toward working-group-driven development, with 19 contributors involved in the release. The update improves LMX-Omni virtual models for Open WebUI and OpenAI-compatible multimedia clients, introduces the `lemonade bench` CLI, and expands backend support. CUDA, Vulkan, llama.cpp, stable-diffusion.cpp, FastFlowLM, and vLLM are part of the broader push toward cross-vendor local AI performance.
Google announces Gemini 3.5 Live Translate for instant voice-to-voice translation
Ars Technica AI48 days agoNew Tool
Google has announced Gemini 3.5 Live Translate, a real-time voice-to-voice translation system that preserves the original speaker's tone, pacing, and pitch rather than producing flat synthetic output. The system embeds Google's SynthID watermarks into translated audio, enabling AI content provenance detection without affecting audio quality. This extends Google's Gemini Live multimodal API capabilities into cross-language communication scenarios such as meetings, live streams, and customer service.
Google's Gemma 4 12B is designed to run on 16GB RAM laptops
Ars Technica AI54 days agoRelease
Google introduced Gemma 4 12B, an open model aimed at running locally on laptops with 16GB of RAM. The model uses a new encoding scheme and token prediction to improve efficiency relative to its size. Its practical importance depends on real-world benchmarks, but it could lower the barrier for private, offline, and local multimodal AI workflows.
Google 全新「任意對任意」AI 模型 Gemini Omni 實測：效果驚人且近乎無縫★ 85
The Verge AI66 days agoRelease
Google recently unveiled a brand-new "anything-to-anything" multimodal AI model — Gemini Omni — whose powerful cross-modal generation and transformation…
神秘 AI 新創 Hark 完成 7 億美元 A 輪融資，打造「通用」AI 介面與專屬硬體★ 75
TechCrunch AI68 days agoBusiness
The mysterious AI startup Hark has announced the successful completion of a Series A funding round totaling $700 million (approximately NT$22 billion), capital…
Google I/O 2026 重磅發布：Gemini 3.5 Flash、Omni (NanoBanana 影片模型)、Spark 背景 Agent 與 Antigravity 2.0★ 85
Latent Space69 days agoRelease
In the latest issue of Latent Space AINews, the major announcements from Google I/O 2026 were covered in depth. Google demonstrated its formidable R&D and…
Google DeepMind 發表 Gemini Omni：全新原生全模態模型，實現超低延遲即時影音與語音互動★ 95
Google DeepMind Blog71 days agoRelease
Google DeepMind has officially unveiled its latest flagship AI model, "Gemini Omni." This model represents a major breakthrough by Google in the field of…
Gemini for Science：Google DeepMind 推出全新科學 AI 工具與實驗，開啟探索新紀元★ 85
Google DeepMind Blog72 days agoRelease
Google DeepMind has unveiled a new initiative called "Gemini for Science" — a collection of AI tools and experiments designed to expand the scale and precision…
NVIDIA 推出 Nemotron 3 Nano Omni：支援長文本的多模態智慧模型，專為文件、語音與影片 Agent 設計★ 75
Hugging Face Blog90 days agoRelease
NVIDIA has officially launched a new lightweight multimodal model, "Nemotron 3 Nano Omni." This model is designed to deliver powerful multimodal intelligence…
TII 推出全新 Falcon Perception 多模態感知模型★ 75
Hugging Face Blog118 days agoRelease
The Technology Innovation Institute (TII) of the UAE has officially announced the launch of its new "Falcon Perception" model on the Hugging Face blog. As an…
IBM 推出 Granite 4.0 3B Vision：專為企業文件設計的輕量級多模態 AI 模型★ 75
Hugging Face Blog118 days agoRelease
IBM has officially launched its new lightweight multimodal model on Hugging Face — the Granite 4.0 3B Vision. With 3 billion (3B) parameters, this model is…
Hugging Face 開源生態報告：2026 春季版★ 85
Hugging Face Blog132 days agoCommentary
Hugging Face has published its Spring 2026 "State of Open Source AI" report, offering a comprehensive review of the explosive growth and paradigm shifts that…
Gemini 推出全新音樂創作功能：整合 Lyria 3 模型，支援文字與圖片生成 30 秒音樂★ 78
Google DeepMind Blog159 days agoRelease
Google DeepMind announced today (February 18, 2026) that its popular AI assistant application Gemini has officially integrated its most advanced music…
Google DeepMind 推出全新改進版 Gemini 音訊模型，打造更強大的語音互動體驗★ 85
Google DeepMind Blog227 days agoRelease
Google DeepMind has announced a major upgrade to its Gemini audio models, aimed at delivering a more natural, fluid, and low-latency voice interaction…
Google DeepMind 發表 Gemini 3 Pro 圖像模型「Nano Banana Pro」：開啟下一代視覺生成與構建★ 78
Google DeepMind Blog249 days agoRelease
Google DeepMind has unveiled a new model called "Nano Banana Pro," which is also the Pro-tier image model of the Gemini 3 generation (Gemini 3 Pro Image…
開始使用 Gemini 3 進行開發：Google DeepMind 全新一代模型正式登場★ 95
Google DeepMind Blog251 days agoRelease
Google DeepMind today officially unveiled its latest generation AI model family — Gemini 3 — and extended an invitation to developers worldwide, formally…
Google DeepMind 發表全新一代 Gemini 3：開啟主動式 AI 與超強推理的全新智能時代★ 98
Google DeepMind Blog251 days agoRelease
Google DeepMind officially unveiled its latest flagship AI model — Gemini 3 — in November 2025. This marks a new milestone for Google in the field of…
Google DeepMind 推出 SIMA 2：由 Gemini 驅動、能在 3D 虛擬世界中與你一同遊玩、推理與學習的 AI 代理★ 85
Google DeepMind Blog256 days agoRelease
Google DeepMind has officially introduced SIMA 2 (Scalable Instructable Multiworld Agent 2). Compared to its predecessor, the most significant transformation…
Gemini 2.5 Flash-Lite 正式推出穩定版，支援大規模生產環境部署★ 75
Google DeepMind Blog275 days agoRelease
Google DeepMind today announced that Gemini 2.5 Flash-Lite — its lightweight AI model that had previously been in preview — has officially transitioned to a…
Google DeepMind 推出實驗性 AI 工具「Backstory」，幫助使用者探索網路圖片的背景與來源★ 75
Google DeepMind Blog277 days agoNew Tool
Google DeepMind recently unveiled a new experimental AI tool called "Backstory," designed to help internet users deeply explore and understand the background…
Gemini Robotics 1.5 發表：將 AI Agent 帶入實體世界，賦能機器人感知、規劃與工具操作★ 85
Google DeepMind Blog277 days agoRelease
Google DeepMind has officially announced the launch of Gemini Robotics 1.5, marking the formal entry of AI Agent technology into the physical world and…
Google 推出 Gemini 2.5 Computer Use 模型：基於 Gemini 2.5 Pro，支援 API 預覽★ 85
Google DeepMind Blog277 days agoRelease
Google DeepMind has officially launched the new dedicated "Gemini 2.5 Computer Use" model, which is now available in preview via API. This model is built on…
Hugging Face 推出 AI Sheets 影像功能：用試算表輕鬆解鎖批次影像處理與多模態分析
Hugging Face Blog280 days agoNew Tool
Hugging Face has recently released a major update for its innovative spreadsheet AI tool "AI Sheets," officially unlocking powerful image processing…
Gemini 2.5 迎來全新突破：先進的語音對話與音訊生成功能★ 85
Google DeepMind Blog419 days agoRelease
Google DeepMind has announced that its latest-generation model, Gemini 2.5, has achieved new breakthroughs in AI-driven audio dialog and audio generation. This…
Google 發表 Gemma 3n 預覽版：強大、高效且行動優先的端側多模態 AI 模型★ 78
Google DeepMind Blog434 days agoRelease
Google DeepMind has officially released a preview of its new open model "Gemma 3n." This is a cutting-edge open model purpose-built for mobile devices and…
Hugging Face 釋出 2025 視覺語言模型（VLM）指南：更強、更快、更實用的開源新時代★ 80
Hugging Face Blog442 days agoOpinion
With the explosion of multimodal technology, Vision Language Models (VLMs) have evolved from laboratory research prototypes into core tools for enterprises and…
Google 推出全新 Gemma 3：支援多模態、多語言與長文本的開源大語言模型★ 90
Hugging Face Blog503 days agoRelease
Google has officially launched Gemma 3, the next generation of its open-source large language model series — a major technical leap forward from Gemma 2. Gemma…
深入解析 Aya Vision：推動多語言多模態 AI 的前沿發展★ 75
Hugging Face Blog511 days agoRelease
Cohere For AI (C4AI) has officially launched "Aya Vision," a series of open-source multimodal models (available in 8B and 32B parameter versions) designed…
邁向多模態：Prezi 如何利用 Hugging Face Hub 與專家支持計畫加速其機器學習路線圖
Hugging Face Blog769 days agoBusiness
In this case study, Prezi — the well-known company behind the non-linear presentation software of the same name — shares how it is embracing the "multimodal…
阿布達比 TII 發表 Falcon 2 11B：搭載 5 兆 Token 訓練的預訓練語言與視覺語言模型★ 75
Hugging Face Blog795 days agoRelease
The Technology Innovation Institute (TII) of Abu Dhabi has officially released a new open-source model family on Hugging Face — Falcon 2 11B. This model, with…

Page 1Next →

Latest in AI

Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support

Google announces Gemini 3.5 Live Translate for instant voice-to-voice translation

Google's Gemma 4 12B is designed to run on 16GB RAM laptops

Google 全新「任意對任意」AI 模型 Gemini Omni 實測：效果驚人且近乎無縫★ 85

神秘 AI 新創 Hark 完成 7 億美元 A 輪融資，打造「通用」AI 介面與專屬硬體★ 75

Google I/O 2026 重磅發布：Gemini 3.5 Flash、Omni (NanoBanana 影片模型)、Spark 背景 Agent 與 Antigravity 2.0★ 85

Google DeepMind 發表 Gemini Omni：全新原生全模態模型，實現超低延遲即時影音與語音互動★ 95

Gemini for Science：Google DeepMind 推出全新科學 AI 工具與實驗，開啟探索新紀元★ 85

NVIDIA 推出 Nemotron 3 Nano Omni：支援長文本的多模態智慧模型，專為文件、語音與影片 Agent 設計★ 75

TII 推出全新 Falcon Perception 多模態感知模型★ 75

IBM 推出 Granite 4.0 3B Vision：專為企業文件設計的輕量級多模態 AI 模型★ 75

Hugging Face 開源生態報告：2026 春季版★ 85

Gemini 推出全新音樂創作功能：整合 Lyria 3 模型，支援文字與圖片生成 30 秒音樂★ 78

Google DeepMind 推出全新改進版 Gemini 音訊模型，打造更強大的語音互動體驗★ 85

Google DeepMind 發表 Gemini 3 Pro 圖像模型「Nano Banana Pro」：開啟下一代視覺生成與構建★ 78

開始使用 Gemini 3 進行開發：Google DeepMind 全新一代模型正式登場★ 95

Google DeepMind 發表全新一代 Gemini 3：開啟主動式 AI 與超強推理的全新智能時代★ 98

Google DeepMind 推出 SIMA 2：由 Gemini 驅動、能在 3D 虛擬世界中與你一同遊玩、推理與學習的 AI 代理★ 85

Gemini 2.5 Flash-Lite 正式推出穩定版，支援大規模生產環境部署★ 75

Google DeepMind 推出實驗性 AI 工具「Backstory」，幫助使用者探索網路圖片的背景與來源★ 75

Gemini Robotics 1.5 發表：將 AI Agent 帶入實體世界，賦能機器人感知、規劃與工具操作★ 85

Google 推出 Gemini 2.5 Computer Use 模型：基於 Gemini 2.5 Pro，支援 API 預覽★ 85

Hugging Face 推出 AI Sheets 影像功能：用試算表輕鬆解鎖批次影像處理與多模態分析

Gemini 2.5 迎來全新突破：先進的語音對話與音訊生成功能★ 85

Google 發表 Gemma 3n 預覽版：強大、高效且行動優先的端側多模態 AI 模型★ 78

Hugging Face 釋出 2025 視覺語言模型（VLM）指南：更強、更快、更實用的開源新時代★ 80

Google 推出全新 Gemma 3：支援多模態、多語言與長文本的開源大語言模型★ 90

深入解析 Aya Vision：推動多語言多模態 AI 的前沿發展★ 75

邁向多模態：Prezi 如何利用 Hugging Face Hub 與專家支持計畫加速其機器學習路線圖

阿布達比 TII 發表 Falcon 2 11B：搭載 5 兆 Token 訓練的預訓練語言與視覺語言模型★ 75