WEBINARS

Dynamic DAGs

Watch Video On Demand

Hosted By

  • Kenten Danas
  • Viraj Parekh

Note: This webinar was recorded in June 2021. Since then Airflow 2.3 was released adding dynamic tasks to Airflow which has changed and improved many of the patterns shown in this webinar. Please check out our newer Dynamic Tasks in Airflow webinar, the Astronomer Academy module Airflow: Dynamic Task Mapping and our Create dynamic Airflow tasks guide for the latest best practices around dynamic tasks and the Dynamically generate DAGs in Airflow guide and the Airflow: Dynamic DAGs academy module for information on how to dynamically generate DAGs.

The simplest way of creating an Airflow DAG is to write it as a static Python file. However, sometimes manually writing DAGs isn’t practical.

Maybe you have hundreds or thousands of DAGs that do similar things, with just a parameter changing between them. Or maybe you need a set of DAGs to load tables, but don’t want to manually update DAGs every time those tables change.

In these cases, and others, it can make more sense to dynamically generate DAGs. Because everything in Airflow is code, you can dynamically generate DAGs using Python alone.

In this webinar, we’ll talk about when you might want to dynamically generate your DAGs, show a couple of methods for doing so, and discuss problems that can arise when implementing dynamic generation at scale.

In this webinar we cover:

Generating DAGs - The Static Way

Most people who have used Airflow are familiar with defining DAGs statically.

You create a Python file, instantiate your DAG, and define your tasks.

dynamic-dags-1

But What Actually Makes a DAG?

dynamic-dags-2

dynamic-dags-3

A dynamically generated DAG is created when each parsing of the DAG file could create different results.

Why is this useful?

Dynamically generating DAGs can be helpful when you have DAGs that follow a similar pattern, and:

Ways to Dynamically Generate DAGs: Single File

Create a Python script that lives in your DAG_FOLDER that generates DAG objects.

You may have a function that creates the DAG based on some parameters, and then a loop that calls that function for each input.

Those parameters may come from:

dynamic-dag-4

dynamic-dag-5

Ways to Dynamically Generate DAGs: Multiple Files

Create a Python script (or other script) that actually generates DAG .py files, which are then loaded into your Airflow environment.

This is most straightforward if you are parameterizing the same DAG structure, and want to automatically read those params from YAML, Json, etc.

dynamic-dags-6

dynamic-dags-7

Pros and Cons

dynamic-dags-8

Scalability

Any code in the DAG_FOLDER will be executed on every Scheduler heartbeat. Methods where that code is dynamically generating DAGs, such as the single-file method, are more likely to cause performance issues at scale.

If DAG parsing time > Scheduler heartbeat interval, the scheduler can get locked up and tasks won’t be executed.

Community Tools A notable tool for dynamically creating DAGs from the community is dag-factory. dag-factory is an open source Python library for dynamically generating Airflow DAGs from YAML files.

https://github.com/ajbosco/dag-factory

dynamic-dags-9

Code Examples

This repo contains an Astronomer project with multiple examples showing how to dynamically generate DAGs in Airflow. https://github.com/astronomer/dynamic-dags-tutorial

Build, run, & observe your data workflows.
All in one place.

Get $300 in free credits during your 14-day trial.

Get Started Free