Dynamic DAGs

Note: This webinar was recorded in June 2021. Since then Airflow 2.3 was released adding dynamic tasks to Airflow which has changed and improved many of the patterns shown in this webinar. Please check out our newer Dynamic Tasks in Airflow webinar, the Astronomer Academy module Airflow: Dynamic Task Mapping and our Create dynamic Airflow tasks guide for the latest best practices around dynamic tasks and the Dynamically generate DAGs in Airflow guide and the Airflow: Dynamic DAGs academy module for information on how to dynamically generate DAGs.

The simplest way of creating an Airflow DAG is to write it as a static Python file. However, sometimes manually writing DAGs isn’t practical.

Maybe you have hundreds or thousands of DAGs that do similar things, with just a parameter changing between them. Or maybe you need a set of DAGs to load tables, but don’t want to manually update DAGs every time those tables change.

In these cases, and others, it can make more sense to dynamically generate DAGs. Because everything in Airflow is code, you can dynamically generate DAGs using Python alone.

In this webinar, we’ll talk about when you might want to dynamically generate your DAGs, show a couple of methods for doing so, and discuss problems that can arise when implementing dynamic generation at scale.

In this webinar we cover:

How Airflow identifies a DAG
Use cases for dynamically generating DAGs
Commonly used methods for dynamic generation
Pitfalls and common issues with dynamic generation

Generating DAGs - The Static Way

Most people who have used Airflow are familiar with defining DAGs statically.

You create a Python file, instantiate your DAG, and define your tasks.

dynamic-dags-1

But What Actually Makes a DAG?

Airflow executes all Python code in the DAG_FOLDER and loads any DAG object found in globals()
This means that any Python code that generates a DAG object can be used to create DAGs

dynamic-dags-2

dynamic-dags-3

A dynamically generated DAG is created when each parsing of the DAG file could create different results.

Why is this useful?

Dynamically generating DAGs can be helpful when you have DAGs that follow a similar pattern, and:

Want to automate migration from a legacy system to Airflow
Have only a parameter changing between DAGs
Have DAGs that are dependent on the changing structure of a source system
Want to institute standards within DAGs across your team or organization

Ways to Dynamically Generate DAGs: Single File

Create a Python script that lives in your DAG_FOLDER that generates DAG objects.

You may have a function that creates the DAG based on some parameters, and then a loop that calls that function for each input.

Those parameters may come from:

Within the file
An Airflow variable
Airflow connections
Etc.

dynamic-dag-4

dynamic-dag-5

Ways to Dynamically Generate DAGs: Multiple Files

Create a Python script (or other script) that actually generates DAG .py files, which are then loaded into your Airflow environment.

This is most straightforward if you are parameterizing the same DAG structure, and want to automatically read those params from YAML, Json, etc.

dynamic-dags-6

dynamic-dags-7

Pros and Cons

dynamic-dags-8

Scalability

Any code in the DAG_FOLDER will be executed on every Scheduler heartbeat. Methods where that code is dynamically generating DAGs, such as the single-file method, are more likely to cause performance issues at scale.

If DAG parsing time > Scheduler heartbeat interval, the scheduler can get locked up and tasks won’t be executed.

Community Tools A notable tool for dynamically creating DAGs from the community is dag-factory. dag-factory is an open source Python library for dynamically generating Airflow DAGs from YAML files.

https://github.com/ajbosco/dag-factory

dynamic-dags-9

Code Examples

This repo contains an Astronomer project with multiple examples showing how to dynamically generate DAGs in Airflow. https://github.com/astronomer/dynamic-dags-tutorial

Dynamic DAGs

Generating DAGs - The Static Way

But What Actually Makes a DAG?

A dynamically generated DAG is created when each parsing of the DAG file could create different results.

Ways to Dynamically Generate DAGs: Single File

Ways to Dynamically Generate DAGs: Multiple Files

Pros and Cons

Scalability

Code Examples

See More Resources

Production-Ready ELT Pipelines with Airflow 3 and Snowflake

Build vs Buy? How to choose your Airflow strategy

Best practices for writing ETL/ELT pipelines

Modernizing Your Data Stack: Replacing Legacy Schedulers with Apache Airflow

Try Astro for Free for 14 Days

Build, run, & observe your data workflows.
All in one place.

Build, run, & observe
your data workflows.
All in one place.

Dynamic DAGs

Generating DAGs - The Static Way

But What Actually Makes a DAG?

A dynamically generated DAG is created when each parsing of the DAG file could create different results.

Ways to Dynamically Generate DAGs: Single File

Ways to Dynamically Generate DAGs: Multiple Files

Pros and Cons

Scalability

Code Examples

See More Resources

Production-Ready ELT Pipelines with Airflow 3 and Snowflake

Build vs Buy? How to choose your Airflow strategy

Best practices for writing ETL/ELT pipelines

Modernizing Your Data Stack: Replacing Legacy Schedulers with Apache Airflow

Try Astro for Free for 14 Days

Build, run, & observe your data workflows. All in one place.

Build, run, & observe your data workflows. All in one place.

Build, run, & observe your data workflows.
All in one place.

Build, run, & observe
your data workflows.
All in one place.