Ingestion & ETL (Airflow → PostgreSQL)¶
Role of Airflow¶
Airflow orchestrates all external ETL workflows, including:
- scraping jobs,
- normalization and enrichment,
- exports to downstream storage.
Airflow is not used for ML experimentation.
PostgreSQL as canonical store¶
PostgreSQL acts as:
- the canonical structured representation of scraped data,
- a stable source for downstream raw exports.
Characteristics:
- normalized schema,
- constraints and indexes where applicable,
- append-only or slowly mutating tables.
ETL guarantees¶
The ETL layer guarantees: - referential integrity where possible, - schema stability for downstream consumers, - separation between ingestion and analytics workloads.
Monitoring¶
ETL pipelines are monitored for:
- execution success/failure,
- data volume anomalies,
- freshness delays.
Critical failures block downstream raw exports.