Docs Menu
Example Dataset Overview
We’ll use an example dataset throughout this user guide series. The dataset contains details about over 5,000 medium articles published between Jan 2020 to August 2020 in prominent publications.
Obtain the dataset
This dataset is available in a public S3 storage bucket.
For a database deployed on Amazon Web Service (AWS), Click here to copy the S3 URL.
For a database deployed on Google Cloud Platform (GCP), Click here to copy the Google Cloud Storage (GCS) URL.
To know more about the dataset, read the introduction page on Kaggle.
Dataset schema
In the dataset, each data record has eight attributes. Use this table as a reference when you create the schema of your collection.
Field name | Type | Dimension / Max length |
---|---|---|
id | INT64 | N/A |
title_vector | FLOAT_VECTOR | 768 |
title | VARCHAR | 512 |
link | VARCHAR | 512 |
reading_time | INT64 | N/A |
publication | VARCHAR | 512 |
claps | INT64 | N/A |
responses | INT64 | N/A |
Next steps
- Obtain the dataset
- Dataset schema
- Next steps
On this page