This repository contains two Apache Airflow DAGs and a dbt project for implementing an ETL/ELT pipeline targeting stock data. The Airflow DAGs handle orchestration of data pipelines, while dbt is used for modular SQL transformations. Below is an overview of the flow for each DAG and the dbt project structure.
In order to run these files in Airflow and dbt, the following steps need to be followed:
- Install Apache Airflow and set up a local or cloud-based instance.
- Install dbt and configure the
profiles.yml
file with your database connection details. - Install the following Python libraries:
apache-airflow
dbt-core
- Configure Airflow connection IDs and variables (if needed) for your environment.
- Place the two Airflow DAG
.py
files in thedags
folder of your Airflow setup. - Clone the
dbt_project
folder into your working directory.
This Airflow DAG orchestrates an ETL pipeline that extracts, transforms, and loads stock data into a data warehouse.
-
Extract Data:
- Fetches raw stock data from a source (e.g., API or CSV).
-
Transform Data:
- Cleans and preprocesses the raw data to match the schema required by the database.
-
Load Data:
- Inserts the processed data into a staging table in the data warehouse.
- Extract stock data →
- Clean and preprocess the data →
- Load data into the data warehouse.
This Airflow DAG integrates with dbt to perform ELT operations, focusing on transformations directly in the database.
-
Load Data:
- Inserts raw stock data into a database using Airflow tasks.
-
Run dbt Transformations:
- Executes dbt commands (
dbt run
,dbt test
) to transform raw data into models and data marts.
- Executes dbt commands (
-
Validate Data:
- Runs dbt tests to validate model integrity.
- Load raw stock data →
- Execute dbt transformations →
- Validate model outputs.
-
Raw Models:
raw/stock_data.sql
: Represents the raw stock data ingested into the warehouse.
-
Transformations:
transformations/stock_analysis.sql
: Analyzes stock data (e.g., calculating moving averages, RSI).
-
Data Marts:
marts/stock_mart.sql
: Final datasets prepared for BI tools.
-
Snapshots:
snapshots/stock_snapshot.sql
: Tracks changes over time using Slowly Changing Dimensions (SCD).
- Raw Table: Holds the ingested raw stock data.
- Transformed Table: Contains clean and processed data ready for analysis.
- Data Marts: Consolidated tables for reporting and visualization.
Created by: • Nikhil Swami ([email protected]) • Deeksha Chauhan ([email protected])