Olist E-Commerce Data Pipeline (BigQuery)
An ELT pipeline that loads the Brazilian E-Commerce dataset (Olist) into Google BigQuery and builds a star schema using dbt-bigquery, with orchestration via Dagster and data quality checks.
Overview
The pipeline ingests Olist CSV data into BigQuery raw tables, then uses dbt to create staging views and a dimensional data warehouse (marts). It supports both direct local-to-BigQuery ingestion and an optional path via Google Cloud Storage. A Dagster job runs the full flow: ingest → dbt run → dbt test → data quality. Analysis is done in a Jupyter notebook using pandas_gbq.
Schema
- Raw: dataset
raw_olist(customers, orders, order_items, products, sellers, etc.) - Staging: dataset
dw_stg_olist(views) - Marts: dataset
dw_dw(dim_customer, dim_product, dim_seller, dim_date, fact_order_items)
Tech & Tools
- Python 3.10+
- Google Cloud (BigQuery, optional GCS)
- dbt-core with dbt-bigquery
- Dagster (orchestration)
- Great Expectations (data quality)
- Jupyter, pandas, pandas_gbq (analysis)