Prepare Schema

A schema is a skeletal structure representing the set of fields shared across all entities in a collection. Before creating a collection, you need to prepare a schema by defining all the fields in a specific order with their names, types, and optional descriptions.

Check your data

In the dataset prepared for this example, each data record has eight attributes. You need to create a field for each attribute. The following table lists the details:

Field nameTypeDimension / Max length
idINT64N/A
title_vectorFLOAT_VECTOR768
titleVARCHAR512
linkVARCHAR512
reading_timeINT64N/A
publicationVARCHAR512
clapsINT64N/A
responsesINT64N/A

Create fields

The following snippet defines the schema according to the above table.

from pymilvus import FieldSchema, CollectionSchema, DataType

fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
    FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=512),   
    FieldSchema(name="title_vector", dtype=DataType.FLOAT_VECTOR, dim=768),
    FieldSchema(name="link", dtype=DataType.VARCHAR, max_length=512),
    FieldSchema(name="reading_time", dtype=DataType.INT64),
    FieldSchema(name="publication", dtype=DataType.VARCHAR, max_length=512),
    FieldSchema(name="claps", dtype=DataType.INT64),
    FieldSchema(name="responses", dtype=DataType.INT64)    
]

In this example,

  • id is the primary field. For this field, the parameter is_primary is set to True.
  • title_vector is a vector field. The parameter dim specifies the vector dimension.
  • title, link, and publication are string fields. The parameter max_length specifies the maximum number of characters allowed in the string.
  • reading_time, claps, and responses are integer fields. No extra parameters need to be set on these fields.

Data types

For your reference, Zilliz Cloud supports the following field data types:

  • Binary vector (BINARY_VECTOR)
  • Boolean value (BOOLEAN)
  • 8-byte floating-point (DOUBLE)
  • 4-byte floating-point (FLOAT)
  • Float vector (FLOAT_VECTOR)
  • 8-bit integer (INT8)
  • 32-bit integer (INT32)
  • 64-bit integer (INT64)
  • Variable character (VARCHAR)

Note that binary and float vectors are only suitable for the vector fields.

Next steps