OpenAI / text-embedding-3-small
Milvus & Zilliz Cloud Integrated
Task: Embedding
Modality: Text
Similarity Metric: Any (Normalized)
License: Proprietary
Dimensions: 1536
Max Input Tokens: 8191
Price: $ 0.02/1M tokens
Introduction to text-embedding-3-small
text-embedding-3-small
is OpenAI’s small text embedding model, creating embeddings with 1536 dimensions. Compared to OpenAI’s other text embedding models, like text-embedding-ada-002
and text-embedding-3-large
, text-embedding-3-small
is the most cost-effective model with improved accuracy and efficiency. It is great for general-purpose vector search applications.
Let’s take a quick look at some basics.
Model | Dimensions | Max Tokens | Model MIRACL avg | METB avg | Price |
---|---|---|---|---|---|
text-embedding-3-large | 3072 | 8191 | 54.9 | 64.6 | $0.13 / 1M tokens |
text-embedding-ada-002 | 1536 | 8191 | 31.4 | 61.0 | $0.10 / 1M tokens |
text-embedding-3-small | 1536 | 8191 | 44.0 | 62.3 | $0.02 / 1M tokens |
How to generate vector embeddings with text-embedding-3-small
There are two primary ways to create vector embeddings:
- PyMilvus: the Python SDK for Milvus that seamlessly integrates the
text-embedding-3-small
model. - OpenAI Embedding: the Python SDK offered by OpenAI.
Once the vector embeddings are generated, they can be stored in Zilliz Cloud (a fully managed vector database service powered by Milvus) and used for semantic similarity search. Here are four key steps:
- Sign up for a Zilliz Cloud account for free.
- Set up a serverless cluster and obtain the Public Endpoint and API Key.
- Create a vector collection and insert your vector embeddings.
- Run a semantic search on the stored embeddings.
Generate vector embeddings via PyMilvus and insert them into Zilliz Cloud for semantic search
from pymilvus import model, MilvusClient
OPENAI_API_KEY = "your-openai-api-key"
ef = model.dense.OpenAIEmbeddingFunction(
model_name="text-embedding-3-small",
api_key=OPENAI_API_KEY,
)
# Generate embeddings for documents
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England."
]
docs_embeddings = ef.encode_documents(docs)
# Generate embeddings for queries
queries = ["When was artificial intelligence founded",
"Where was Alan Turing born?"]
query_embeddings = ef.encode_queries(queries)
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
uri=ZILLIZ_PUBLIC_ENDPOINT,
token=ZILLIZ_API_KEY)
COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
client.create_collection(
collection_name=COLLECTION,
dimension=ef.dim,
auto_id=True)
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
consistency_level="Strong",
output_fields=["text"])
For more information, refer to our PyMilvus Embedding Model documentation.
Generate embeddings via OpenAI’s Python SDK and insert them into Zilliz Cloud for semantic search
from openai import OpenAI
from pymilvus import MilvusClient
OPENAI_API_KEY = "your-openai-api-key"
client = OpenAI(api_key=OPENAI_API_KEY)
# Generate embeddings for documents
doc_response = client.embeddings.create(
input=[
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England."
],
model="text-embedding-3-small"
)
doc_embeddings = [data.embedding for data in doc_response.data]
# Generate embeddings for queries
query_response = client.embeddings.create(
input=["When was artificial intelligence founded",
"Where was Alan Turing born?"],
model="text-embedding-3-small"
)
query_embeddings = [data.embedding for data in query_response.data]
# Connect to Zilliz Cloud with Public Endpoint and API Key
client = MilvusClient(
uri=ZILLIZ_PUBLIC_ENDPOINT,
token=ZILLIZ_API_KEY)
COLLECTION = "documents"
if client.has_collection(collection_name=COLLECTION):
client.drop_collection(collection_name=COLLECTION)
client.create_collection(
collection_name=COLLECTION,
dimension=1536,
auto_id=True)
for doc, embedding in zip(docs, docs_embeddings):
client.insert(COLLECTION, {"text": doc, "vector": embedding})
results = client.search(
collection_name=COLLECTION,
data=query_embeddings,
consistency_level="Strong",
output_fields=["text"])
For more information, refer to OpenAI’s Embedding Guide.
- Introduction to text-embedding-3-small
- How to generate vector embeddings with text-embedding-3-small
Content
Start Free, Scale Easily
Try the fully-managed vector database built for your GenAI applications.
Try Zilliz Cloud for Free