Vector databases serve as crucial components for storing, indexing, and querying the numerical representations of data known as embeddings. Google Embedding 2, also known as gemini-embedding-2-preview, is Google's multimodal embedding model designed to map various data types—text, images, video, audio, and documents—into a unified vector space. This model outputs high-dimensional vectors, typically 3072 dimensions by default, though it supports scaling down to 1536 or 768 dimensions. Any vector database capable of handling floating-point vector data of these dimensions can effectively store and manage embeddings generated by Google Embedding 2.
Both Milvus, an open-source vector database, and Zilliz Cloud, a fully managed service built on Milvus, inherently support storing and querying vector embeddings generated by Google Embedding 2. The primary function of these databases is to store high-dimensional vectors and enable efficient similarity searches. Therefore, once the embeddings are generated by Google Embedding 2, they can be inserted into Milvus or Zilliz Cloud collections. Milvus, for instance, allows defining a schema for collections where the vector field's dimension (dim) is set to match the exact output dimension of the embedding model, such as the 3072 dimensions produced by Google Embedding 2. This ensures that the generated embeddings are correctly accommodated for storage and subsequent vector search operations.
Furthermore, Milvus offers direct integration capabilities with Google Cloud Vertex AI, which is the platform through which Google Embedding 2 is available. Milvus 2.6 introduced an "Embedding Function" feature, also known as Data-in, Data-out, that allows it to connect directly to major model providers, including Google Vertex AI, to generate embeddings. While Milvus documentation explicitly lists other Vertex AI embedding models like gemini-embedding-001 as compatible, the architectural design for integrating with Vertex AI means that gemini-embedding-2-preview can also be leveraged as long as the necessary configurations and API access are established. Zilliz Cloud, being a managed service based on Milvus with strong integration into the Google Cloud ecosystem, simplifies this process by providing a streamlined environment for managing both the generation and storage of such embeddings. This allows developers to build sophisticated AI applications, such as Retrieval-Augmented Generation (RAG) systems, that leverage Google Embedding 2's multimodal capabilities within a robust vector database infrastructure.
