Insert Entities

An entity is a basic data unit of a collection. It represents a member of a class, like a book in a library or a gene in a genome. Entities in a collection share the same set of attributes termed schema. You can insert an individual entity or insert multiple entities in a batch from a JSON file.

Insert entities

Use the insert API if you need to add a small set of entities. Ensure that the data meets the schema requirements. In the following example, we illustrate how to insert an example dataset.

import pandas as pd
# Use the JSON file downloaded from the public S3 bucket.
df = pd.read_json('medium_articles_2020.json')

from pymilvus import Collection

# Get an existing collection.
collection = Collection("medium_articles_2020")

# Prepare the data to be inserted
# Consider splitting data into chunks if your dataset is large.
data = []
keys = df['rows'][0].keys()

for key in keys:
    data.append([row.get(key) for row in df['rows']])

# Insert the data
collection.insert(data)

# Flush the uploaded data to cloud storage and output the inserted number of records.
# The flush() method applies to index building and the output of 'num_entities'. Do not use it if unnecessary.
collection.flush()
print(str(collection.num_entities))

Insert entities from an AWS S3 bucket

Prepare a JSON file

In the JSON file, you can organize your data in a dictionary with rows as the key and all the data records in a list as the value. In the list, each record corresponds to a dictionary. The key of each dictionary member is a field name, the values are those of these fields. Note that the file size should be no greater than 1 GB.

For your reference, the following is an example row-based JSON structure containing two entities.

{
  "rows": 
  [
      {
        "id": 0,
        "title": "The Reported Mortality Rate of Coronavirus Is Not Important", 
        "title_vector": [0.041732933, 0.013779674, -0.027564144, -0.013061441], 
        "link": "https://medium.com/swlh/the-reported-mortality-rate-of-coronavirus-is-not-important-369989c8d912",
        "reading_time": 13,
        "publication": " The Startup",
        "claps": 1100,
        "responses": 18    
      }, 
      {
        "id": 1,
        "title": "Dashboards in Python: 3 Advanced Examples for Dash Beginners and Everyone Else", 
        "title_vector": [0.0039737443, 0.003020432, -0.0006188639, 0.03913546], 
        "link": "https://medium.com/swlh/dashboards-in-python-3-advanced-examples-for-dash-beginners-and-everyone-else",
        "reading_time": 14,
        "publication": " The Startup",
        "claps": 726,
        "responses": 3  
      }
  ]
}

Bulk insert

Before you can insert data, you have to upload the data to one of your AWS S3 buckets and set bucket access policies to enable at least the following permissions: s3:GetObject, s3:GetObjectVersion, s3:GetBucketLocation.

For your convenience, a sample JSON file has been uploaded to a public S3 storage bucket. Click here to copy the file URL. No AK and SK are required for accessing this file.

Once you finish creating a collection and setting the index type, the collection is displayed in the Collections tab of the database details page on Zilliz Cloud, similar to the one in the following snapshot.

View collections View collections

Click the name of the collection to view its details. On the Collection tab, click Import Now in the Import Data area, and select Import a file from S3 in the prompted dialog box.

Import a file from S3 Import a file from S3

Next steps