Yes, Gemma 4 generates embeddings for both text and images, enabling unified multimodal semantic search across content types.
Gemma 4's Per-Layer Embeddings architecture produces high-quality vector representations at each decoder layer. This flexibility allows you to extract embeddings from intermediate layers rather than just the final output, optimizing for dimensionality and performance trade-offs specific to your application.
The multimodal capability is particularly powerful: you can embed images and text in the same vector space, enabling true cross-modal semantic search. For example, you could search for images using text queries or discover textual content related to visual queries. This unified embedding space is essential for modern multimodal retrieval systems.
When paired with Zilliz Cloud, Gemma 4's embeddings benefit from enterprise-grade vector database capabilities: instant similarity search across billions of vectors, multi-replica availability, automatic failover, and integrated backup systems. This managed approach removes the operational burden of maintaining your own infrastructure.
Zilliz Cloud handles indexing, scaling, and performance optimization automatically. You focus on generating quality embeddings with Gemma 4; Zilliz Cloud ensures they're stored, searched, and maintained reliably at any scale.
Related Resources