Docs Menu
Insert Entities
An entity is a basic data unit of a collection. It represents a member of a class, like a book in a library or a gene in a genome. Entities in a collection share the same set of attributes termed schema. You can insert an individual entity or insert multiple entities in a batch from a JSON file.
Insert entities
Use the insert API if you need to add a small set of entities. Ensure that the data meets the schema requirements. In the following example, we illustrate how to insert an example dataset.
import pandas as pd
# Use the JSON file downloaded from the public S3 bucket.
df = pd.read_json('medium_articles_2020.json')
from pymilvus import Collection
# Get an existing collection.
collection = Collection("medium_articles_2020")
# Prepare the data to be inserted
# Consider splitting data into chunks if your dataset is large.
data = []
keys = df['rows'][0].keys()
for key in keys:
data.append([row.get(key) for row in df['rows']])
# Insert the data
collection.insert(data)
# Flush the uploaded data to cloud storage and output the inserted number of records.
# The flush() method applies to index building and the output of 'num_entities'. Do not use it if unnecessary.
collection.flush()
print(str(collection.num_entities))
Insert entities from large files
Prepare a JSON file
In the JSON file, you can organize your data in a dictionary with rows
as the key and all the data records in a list as the value. In the list, each record corresponds to a dictionary. The key of each dictionary member is a field name, the values are those of these fields. Note that the file size should be no greater than 1 GB.
For your reference, the following is an example row-based JSON structure containing two entities.
{
"rows":
[
{
"id": 0,
"title": "The Reported Mortality Rate of Coronavirus Is Not Important",
"title_vector": [0.041732933, 0.013779674, -0.027564144, -0.013061441],
"link": "https://medium.com/swlh/the-reported-mortality-rate-of-coronavirus-is-not-important-369989c8d912",
"reading_time": 13,
"publication": " The Startup",
"claps": 1100,
"responses": 18
},
{
"id": 1,
"title": "Dashboards in Python: 3 Advanced Examples for Dash Beginners and Everyone Else",
"title_vector": [0.0039737443, 0.003020432, -0.0006188639, 0.03913546],
"link": "https://medium.com/swlh/dashboards-in-python-3-advanced-examples-for-dash-beginners-and-everyone-else",
"reading_time": 14,
"publication": " The Startup",
"claps": 726,
"responses": 3
}
]
}
Bulk insert
Once you finish creating a collection and setting the index type, the collection is displayed in the Collections tab of the database details page on Zilliz Cloud, similar to the one in the following snapshot.
Click the name of the collection to view its details. On the Collection tab, click Import Now in the Import Data to the Collection area. You can either import a local file or import a file from an S3 bucket.
To import a dataset, you need to upload the file containing the dataset to Zilliz Cloud. For your convenience, a sample JSON file has been uploaded to a public storage bucket. Click here to copy the file URL. No AK and SK are required for accessing this file.
To import a local file
To import a local file, download the sample JSON file, click upload a file to upload the file, and click Import. The file size should be no greater than 512 MB.
To import a file from an S3 bucket
Before you can insert data, you have to upload the data to one of your AWS S3 buckets and set bucket access policies to enable at least the following permissions: s3:GetObject, s3:GetObjectVersion, s3:GetBucketLocation.
Then you can import a file from the S3 bucket. For demonstration purposes, enter the copied S3 URL in S3 File Path and click Import.