Example Dataset Overview

We’ll use an example dataset throughout this user guide series. The dataset contains details about over 5,000 medium articles published between Jan 2020 to August 2020 in prominent publications.

Obtain the dataset

This dataset is available in a public S3 storage bucket. Click here to copy the S3 URL.

To know more about the dataset, read the introduction page on Kaggle.

Dataset schema

In the dataset, each data record has eight attributes. Use this table as a reference when you create the schema of your collection.

Field nameTypeDimension / Max length
idINT64N/A
title_vectorFLOAT_VECTOR768
titleVARCHAR512
linkVARCHAR512
reading_timeINT64N/A
publicationVARCHAR512
clapsINT64N/A
responsesINT64N/A

Next steps