OpenAI text-embedding-3-large

All models
OpenAI / text-embedding-3-large

OpenAI / text-embedding-3-large

AI Model Milvus & Zilliz Cloud Integrated

Task: Embedding

Modality: Text

Similarity Metric: Any (Normalized)

License: Proprietary

Dimensions: 3072

Max Input Tokens: 8191

Price: $0.13/1M tokens

Introduction to text-embedding-3-large

text-embedding-3-large is OpenAI’s large text embedding model, creating embeddings with up to 3072 dimensions. Compared to OpenAI’s other text embedding models, such as text-embedding-ada-002 and text-embedding-3-large, text-embedding-3-large has stronger performance and reduced prices.

Let’s take a quick look at some basics.

Model	Dimensions	Max Tokens	Model MIRACL avg	METB avg	Price
text-embedding-3-large	3072	8191	54.9	64.6	$0.13 / 1M tokens
text-embedding-ada-002	1536	8191	31.4	61.0	$0.10 / 1M tokens
text-embedding-3-small	1536	8191	44.0	62.3	$0.02 / 1M tokens

How to generate vector embeddings with the text-embedding-3-large model

There are two primary ways to create vector embeddings:

PyMilvus: the Python SDK for Milvus that seamlessly integrates the text-embedding-3-large model.
OpenAI Embedding: the Python SDK offered by OpenAI.

Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:

Sign up for a Zilliz Cloud account for free.
Set up a serverless cluster and obtain the Public Endpoint and API Key.
Create a vector collection and insert your vector embeddings.
Run a semantic search on the stored embeddings.

Generate vector embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search

from pymilvus.model.dense import OpenAIEmbeddingFunction
from pymilvus import MilvusClient

OPENAI_API_KEY = "your-openai-api-key"
ef = OpenAIEmbeddingFunction("text-embedding-3-large", api_key=OPENAI_API_KEY)

docs = [
   "Artificial intelligence was founded as an academic discipline in 1956.",
   "Alan Turing was the first person to conduct substantial research in AI.",
   "Born in Maida Vale, London, Turing was raised in southern England."
]
# Generate embeddings for documents
docs_embeddings = ef(docs)

queries = ["When was artificial intelligence founded",
          "Where was Alan Turing born?"]
# Generate embeddings for queries
query_embeddings = ef(queries)

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
    uri=ZILLIZ_PUBLIC_ENDPOINT,
    token=ZILLIZ_API_KEY)

COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)
client.create_collection(
    collection_name=COLLECTION,
    dimension=ef.dim,
    auto_id=True)

for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})
    
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    consistency_level="Strong",
    output_fields=["text"])

For more information, refer to our PyMilvus Embedding Model documentation.

Generate vector embeddings via OpenAI’s Python SDK and insert them into Zilliz Cloud for semantic search

from openai import OpenAI
from pymilvus import MilvusClient

OPENAI_API_KEY = "your-openai-api-key"
client = OpenAI(api_key=OPENAI_API_KEY)
# Generate embeddings for documents
doc_response = client.embeddings.create(
   input=[
   "Artificial intelligence was founded as an academic discipline in 1956.",
   "Alan Turing was the first person to conduct substantial research in AI.",
   "Born in Maida Vale, London, Turing was raised in southern England."
   ],
   model="text-embedding-3-large"
   )
doc_embeddings = [data.embedding for data in doc_response.data]

# Generate embeddings for queries
query_response = client.embeddings.create(
   input=["When was artificial intelligence founded",
          "Where was Alan Turing born?"],
   model="text-embedding-3-large"
   )
query_embeddings = [data.embedding for data in query_response.data]

# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
    uri=ZILLIZ_PUBLIC_ENDPOINT,
    token=ZILLIZ_API_KEY)

COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
    client.drop_collection(collection_name=COLLECTION)
client.create_collection(
    collection_name=COLLECTION,
    dimension=3072,
    auto_id=True)

for doc, embedding in zip(docs, docs_embeddings):
    client.insert(COLLECTION, {"text": doc, "vector": embedding})
    
results = client.search(
    collection_name=COLLECTION,
    data=query_embeddings,
    consistency_level="Strong",
    output_fields=["text"])