Skip to content

Ingestion & ETL (Airflow → PostgreSQL)

Role of Airflow

Airflow orchestrates all external ETL workflows, including:

  • scraping jobs,
  • normalization and enrichment,
  • exports to downstream storage.

Airflow is not used for ML experimentation.


PostgreSQL as canonical store

PostgreSQL acts as:

  • the canonical structured representation of scraped data,
  • a stable source for downstream raw exports.

Characteristics:

  • normalized schema,
  • constraints and indexes where applicable,
  • append-only or slowly mutating tables.

ETL guarantees

The ETL layer guarantees: - referential integrity where possible, - schema stability for downstream consumers, - separation between ingestion and analytics workloads.


Monitoring

ETL pipelines are monitored for:

  • execution success/failure,
  • data volume anomalies,
  • freshness delays.

Critical failures block downstream raw exports.