Latest in AI

Showing:computer-visionDevelopersClear ×

← Home

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Kaiming He's All-Undergrad Team Achieves Text-to-Image With Only 258M Parameters
量子位 QbitAI39 days agoPaper
A new research paper from Kaiming He's lab — notable for having an all-undergraduate team — demonstrates that high-quality text-to-image generation can be achieved with just 258 million parameters. This challenges the prevailing assumption that competitive image synthesis requires multi-billion-parameter models. The work signals a push toward leaner, more accessible generative vision architectures.
ABot-Earth0.5 Tops Three Hugging Face Paper Leaderboards, Earns Praise from Graphics Expert Chen Baoquan
量子位 QbitAI39 days agoPaper
ABot-Earth0.5, a newly released AI model or research paper, has reached the top position across three concurrent Hugging Face paper ranking lists. The achievement drew public praise from Chen Baoquan, a respected international authority in computer graphics. The milestone signals growing recognition for the project within both the research and graphics communities.
Mistral AI Introduces Mistral OCR 3
Mistral AI News40 days agoRelease
Mistral AI has released Mistral OCR 3, the latest version of its document-parsing and optical character recognition model. The announcement, framed as a research release, signals continued investment by Mistral in structured document understanding. No article body was available; details are inferred from the title and publication metadata alone.
From ICRA to CVPR: What Is the Robotics Community Talking About? | Beijing Wednesday Evening
量子位 QbitAI42 days agoCommentary
QbitAI hosts a Beijing Wednesday-evening meetup tracing the key conversations in the robotics research community from ICRA to CVPR. The event format — common in China's academic tech circuit — brings together researchers and engineers to unpack conference highlights, emerging trends, and cross-disciplinary intersections. No specific paper or product is the focus; the value is in aggregated community signal across two flagship venues.
SCAIL-2: Open-Source End-to-End Character Animation Without Intermediate Pose Representations
r/LocalLLaMA top day48 days agoRelease
SCAIL-2 by zai-org removes the reliance on skeleton maps and inpainting masks common in prior character animation pipelines, driving characters directly from video in an end-to-end manner. Trained on 60K synthesized motion pairs using SCAIL-Preview, Wan-Animate, and MoCha via a Unified Motion Transfer Interface with RoPE design, the model develops emergent abilities beyond its teacher models. Capabilities include cross-identity character replacement, animal-driving scenarios, and zero-shot support for SAM3D-Body mesh rendering.
Launch HN: Transload (YC P26) – Measuring Freight Items with CCTV
Hacker News (AI keywords)48 days agoNew Tool
Transload is a Y Combinator P26 startup that applies computer vision to existing CCTV footage to automatically calculate freight item dimensions, eliminating manual measurement or expensive dedicated hardware. The approach lowers adoption barriers for warehouses and logistics operators by repurposing infrastructure already in place. The team launched on Hacker News to gather early feedback from the developer and logistics community.
Unlocking VLM Potential on Satellite Imagery Through Fine-Tuning
Mistral AI News50 days agoTutorial
Mistral AI demonstrates how LoRA fine-tuning adapts Pixtral-12B to satellite imagery, a specialized visual domain where prompting alone is unreliable. Using the Aerial Image Dataset, the post compares a prompt-based baseline against a fine-tuned model across 30 scene classes. Accuracy rose from 0.56 to 0.91, while invalid label hallucinations dropped from 5% to 0.1%.
CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76
量子位 QbitAI50 days agoPaper
CVPR 2026 named Google DeepMind’s D4RT as Best Paper for fast dynamic 4D scene reconstruction from video. Honorable mentions included Meta’s SAM 3D and NVIDIA’s NitroGen, while TRELLIS.2 won Best Student Paper. The article emphasizes Chinese researcher visibility, ResNet and YOLO receiving the Longuet-Higgins Prize, and a GDUT-led undergraduate-heavy ChordEdit team breaking through among major labs and elite universities.
Reddit Discusses: What is Your Most Unusual Non-LLM AI Tool for Daily Use?
r/LocalLLaMA top day51 days agoCommentary
A popular thread on Reddit's r/LocalLLaMA asks users to share their most unusual or underrated non-LLM AI tools used in daily workflows. While LLMs dominate the spotlight, many developers and power users emphasize that single-purpose models—such as Whisper for transcription, Demucs for audio separation, and Segment Anything (SAM) for vision—offer superior efficiency and lower costs. The discussion highlights a growing trend toward practical, lightweight, and local AI solutions for specific tasks.
DeFlock Hits 100k ALPRs Mapped in USA
Hacker News (AI keywords)57 days agoEthics
A Hacker News post highlights DeFlock reaching 100,000 mapped automated license plate readers in the United States. The original article text was not provided, so the confirmed facts are limited mainly to the title and public context around DeFlock. The item is most relevant to privacy, computer-vision surveillance, civic mapping, and governance rather than new AI models or developer tooling.
網路瘋傳！Figure AI 推出人形機器人 24 小時搬包裹直播，引發社群熱烈關注
Ars Technica AI69 days agoRelease
Humanoid robot startup Figure AI recently launched a highly buzzworthy technology showcase: a 24-hour uninterrupted live stream depicting its latest humanoid…
烏克蘭無人機創辦人 Yaroslav Azhnyuk 談自主無人機技術棧與無人機經濟學：西方國家正處於昏睡狀態
Latent Space71 days agoCommentary
In this episode of the Latent Space podcast, the hosts and guest host Noah Smith (author of the well-known economics and technology blog Noahpinion)…
Google DeepMind 發表 Gemini Omni：全新原生全模態模型，實現超低延遲即時影音與語音互動★ 95
Google DeepMind Blog71 days agoRelease
Google DeepMind has officially unveiled its latest flagship AI model, "Gemini Omni." This model represents a major breakthrough by Google in the field of…
Gemini Robotics-ER 1.6 發布：透過強化具身推理，賦能真實世界機器人任務★ 85
Google DeepMind Blog105 days agoRelease
Google DeepMind has officially announced its latest breakthrough in the field of embodied AI — **Gemini Robotics-ER 1.6**. This model is specifically designed…
Sentence Transformers 推出多模態嵌入與重排（Reranker）模型支援★ 78
Hugging Face Blog110 days agoRelease
The popular open-source library `sentence-transformers` from Hugging Face has received a major update, officially introducing native support for Multimodal…
TII 推出全新 Falcon Perception 多模態感知模型★ 75
Hugging Face Blog118 days agoRelease
The Technology Innovation Institute (TII) of the UAE has officially announced the launch of its new "Falcon Perception" model on the Hugging Face blog. As an…
ImportAI 449：LLM 訓練 LLM、72B 分散式訓練、為什麼電腦視覺比文本生成更難？以及 AI 是否會引發政治過渡期？★ 75
Import AI (Jack Clark)134 days agoCommentary
This issue of Import AI (No. 449) dives deep into several core frontier topics in the current AI landscape, spanning technical breakthroughs and broad…
D4RT：讓 AI 學會用四維視角觀察世界，動態 4D 重建與追蹤速度提升高達 300 倍★ 80
Google DeepMind Blog193 days agoRelease
Google DeepMind has published a new technology called D4RT, designed to enable artificial intelligence to understand and reconstruct the dynamic world we live…
在 Replicate 上運行 Isaac 0.1：專為真實世界感知設計的輕量級具身視覺語言模型
Replicate Blog244 days agoRelease
The cloud AI model deployment and hosting platform Replicate has officially announced support for running the new lightweight vision-language model (VLM) —…
Google DeepMind 新研究：教導 AI 像人類一樣理解與組織視覺世界★ 75
Google DeepMind Blog259 days agoPaper
Google DeepMind has recently published an important study examining the fundamental differences between how AI systems and humans "organize and understand the…
Google DeepMind 運用 AI 繪製、模擬並理解自然生態：守護森林與聆聽鳥鳴★ 75
Google DeepMind Blog264 days agoOpinion
Google DeepMind recently published a feature article exploring how artificial intelligence (AI) can address the dual challenges of global climate change and…
Google DeepMind 推出實驗性 AI 工具「Backstory」，幫助使用者探索網路圖片的背景與來源★ 75
Google DeepMind Blog277 days agoNew Tool
Google DeepMind recently unveiled a new experimental AI tool called "Backstory," designed to help internet users deeply explore and understand the background…
Hugging Face 探討「AI 應對食物過敏」：開源技術如何守護飲食安全
Hugging Face Blog284 days agoOpinion
Hugging Face recently published a feature article on "AI for Food Allergies" in its "Hugging Science" column. Food allergies are a global health concern…
使用 Core ML 與 dots.ocr 實現 Apple 平台上的 SOTA 本地端 OCR★ 72
Hugging Face Blog299 days agoRelease
This technical article from Hugging Face introduces how to deploy a state-of-the-art (SOTA) optical character recognition (OCR) model called dots.ocr using…
Arm 與 Hugging Face 聯手推出「Neural Super Sampling」！加速行動端與邊緣設備的 AI 圖像超取樣★ 75
Hugging Face Blog350 days agoRelease
Arm and Hugging Face have announced a collaboration to launch "Neural Super Sampling (NSS)" technology and related models, officially bringing AI-driven image…
Hugging Face 釋出 2025 視覺語言模型（VLM）指南：更強、更快、更實用的開源新時代★ 80
Hugging Face Blog442 days agoOpinion
With the explosion of multimodal technology, Vision Language Models (VLMs) have evolved from laboratory research prototypes into core tools for enterprises and…
Google 推出 SigLIP 2：更強大的多語言視覺語言編碼器★ 80
Hugging Face Blog522 days agoRelease
Google has officially launched SigLIP 2, a major upgrade to its widely popular SigLIP (Sigmoid Loss for Language-Image Pre-training) vision-language encoder…
Google 推出 PaliGemma 2 Mix：全新指令微調視覺語言模型★ 80
Hugging Face Blog524 days agoRelease
Google has officially launched the PaliGemma 2 Mix model series — a new family of open-source instruction-tuned vision-language models (VLMs) now available on…
Hugging Face 輕量級 Agent 框架 smolagents 正式支援視覺語言模型 (VLM)！★ 80
Hugging Face Blog550 days agoRelease
On January 24, 2025, Hugging Face announced that smolagents — its open-source library designed for building lightweight, high-performance AI agents — now…
Timm ❤️ Transformers：現在可在 Transformers 中直接使用任何 timm 視覺模型★ 80
Hugging Face Blog558 days agoRelease
The official Hugging Face blog has announced exciting news for the computer vision (CV) community: the popular PyTorch image model library `timm` (PyTorch…

Page 1Next →

Latest in AI

Kaiming He's All-Undergrad Team Achieves Text-to-Image With Only 258M Parameters

ABot-Earth0.5 Tops Three Hugging Face Paper Leaderboards, Earns Praise from Graphics Expert Chen Baoquan

Mistral AI Introduces Mistral OCR 3

From ICRA to CVPR: What Is the Robotics Community Talking About? | Beijing Wednesday Evening

SCAIL-2: Open-Source End-to-End Character Animation Without Intermediate Pose Representations

Launch HN: Transload (YC P26) – Measuring Freight Items with CCTV

Unlocking VLM Potential on Satellite Imagery Through Fine-Tuning

CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76

Reddit Discusses: What is Your Most Unusual Non-LLM AI Tool for Daily Use?

DeFlock Hits 100k ALPRs Mapped in USA

網路瘋傳！Figure AI 推出人形機器人 24 小時搬包裹直播，引發社群熱烈關注

烏克蘭無人機創辦人 Yaroslav Azhnyuk 談自主無人機技術棧與無人機經濟學：西方國家正處於昏睡狀態

Google DeepMind 發表 Gemini Omni：全新原生全模態模型，實現超低延遲即時影音與語音互動★ 95

Gemini Robotics-ER 1.6 發布：透過強化具身推理，賦能真實世界機器人任務★ 85

Sentence Transformers 推出多模態嵌入與重排（Reranker）模型支援★ 78

TII 推出全新 Falcon Perception 多模態感知模型★ 75

ImportAI 449：LLM 訓練 LLM、72B 分散式訓練、為什麼電腦視覺比文本生成更難？以及 AI 是否會引發政治過渡期？★ 75

D4RT：讓 AI 學會用四維視角觀察世界，動態 4D 重建與追蹤速度提升高達 300 倍★ 80

在 Replicate 上運行 Isaac 0.1：專為真實世界感知設計的輕量級具身視覺語言模型

Google DeepMind 新研究：教導 AI 像人類一樣理解與組織視覺世界★ 75

Google DeepMind 運用 AI 繪製、模擬並理解自然生態：守護森林與聆聽鳥鳴★ 75

Google DeepMind 推出實驗性 AI 工具「Backstory」，幫助使用者探索網路圖片的背景與來源★ 75

Hugging Face 探討「AI 應對食物過敏」：開源技術如何守護飲食安全

使用 Core ML 與 dots.ocr 實現 Apple 平台上的 SOTA 本地端 OCR★ 72

Arm 與 Hugging Face 聯手推出「Neural Super Sampling」！加速行動端與邊緣設備的 AI 圖像超取樣★ 75

Hugging Face 釋出 2025 視覺語言模型（VLM）指南：更強、更快、更實用的開源新時代★ 80

Google 推出 SigLIP 2：更強大的多語言視覺語言編碼器★ 80

Google 推出 PaliGemma 2 Mix：全新指令微調視覺語言模型★ 80

Hugging Face 輕量級 Agent 框架 smolagents 正式支援視覺語言模型 (VLM)！★ 80

Timm ❤️ Transformers：現在可在 Transformers 中直接使用任何 timm 視覺模型★ 80