Google Introduces Gemma 4 12B: A Unified, Encoder-Free Multimodal Model★ 85
Google DeepMind Blog·5 hours ago·Release
Google DeepMind has unveiled Gemma 4 12B, a next-generation open-weights model featuring a unified, encoder-free multimodal architecture. By eliminating the traditional separate vision encoder (such as ViT), it processes diverse modalities directly within a single Transformer network. This design simplifies training, reduces inference latency, and enhances cross-modal alignment, marking a significant milestone for open-source AI.