Module 2 – Run the data pipeline

Now that you have fully configured your environment, it’s time to process and upload the movie data.

In this module, you will download the dataset, clean and normalize the text, generate embeddings using the model you selected earlier, and upload both the vectors and full movie metadata to your deployed Chromia chain.

By the end of this module, you will have populated your on-chain database, making it ready to support semantic search.

Activate vector_demo_env if it's not already active

If you're in the rell/ folder:

cd ../python

Or from the root of the project:

cd python

Then activate the environment:

source vector_demo_env/bin/activate

Lessons