Hugging Face BlogFeb 21, 2025, 12:00 AMimportant 80

Google 推出 SigLIP 2：更強大的多語言視覺語言編碼器

Original: SigLIP 2: A better multilingual vision language encoder

Google has officially launched SigLIP 2, a major upgrade to its widely popular SigLIP (Sigmoid Loss for Language-Image Pre-training)…

Google 與 Hugging Face 聯合發表 SigLIP 2 視覺語言編碼器。作為經典 SigLIP 的升級版，SigLIP 2 引入了動態解析度、自監督學習（SSL）輔助任務與更強的多語言支援。它在零樣本分類、圖文檢索及定位等任務上表現優異，並提供多種尺寸的模型，非常適合用作新一代多模態大模型（VLM）的視覺骨幹網路（Vision Backbone）。

Google has officially launched SigLIP 2, a major upgrade to its widely popular SigLIP (Sigmoid Loss for Language-Image Pre-training) vision-language encoder. SigLIP 2 is designed to provide stronger visual and linguistic representation capabilities for multimodal tasks.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source huggingface #vision-encoder #vlm #multilingual #clip #computer-vision

Summaries are AI-generated; the original article is authoritative.