Google Introduces Gemma 4 12B: A Unified, Encoder-Free Multimodal Model
Original: Introducing Gemma 4 12B: a unified, encoder-free multimodal model
Google launched Gemma 4 12B, a unified, encoder-free multimodal open model that simplifies cross-modal processing.
Google DeepMind has unveiled Gemma 4 12B, a next-generation open-weights model featuring a unified, encoder-free multimodal architecture. By eliminating the traditional separate vision encoder (such as ViT), it processes diverse modalities directly within a single Transformer network. This design simplifies training, reduces inference latency, and enhances cross-modal alignment, marking a significant milestone for open-source AI.
Google DeepMind 於今日發表了其開源模型家族的最新成員——Gemma 4 12B。這款擁有 120 億參數的模型,代表了開源多模態模型在架構設計上的重大突破。
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on Google DeepMind Blog →Summaries are AI-generated; the original article is authoritative.