Configure Dag bundles for Remote Execution
Overview
Dag bundles, introduced in Airflow 3, are collections of files containing Dag code and supporting files. The two types of bundles are:
GitDagBundle
LocalDagBundle
LocalDagBundle
is the default dag bundle type for the dagBundleConfigList
config option in the Remote Execution Agent Helm chart but you can alternatively configure a git connection with GitDagBundle
for extended versioning capabilities. See GitDagBundle
versus LocalDagBundle
for the functional differences between the two bundle types.
Learn more about dag versions and dag bundles in Airflow dag versioning.
Configure Dag sources with GitDagBundle for Remote Execution
GitDagBundle
is recommended for production Remote Execution Deployments.
Remote Execution mode requires you to configure a dag bundle backend so that the Remote Execution Agents in your environment can access your code and run your pipelines. The following configurations enable your Remote Execution Agent to fetch dags from a private git repo using Airflow's GitDagBundle
. This requires configuring both the dag bundle and appropriate authentication. If you are connecting your Remote Execution Agents to dags in a public repository, you do not have to configure a connection.
In the values.yaml
file, configure the dagBundleConfigList
as follows:
commonEnv:
- name: AIRFLOW_CONN_GIT_DEFAULT
value: '{"conn_type": "git", "login": "<username>", "password": "<access_token>", "host": "<github.com/your-org/private-dags>"}'
dagBundleConfigList: '[{"name": "<private_repo>", "classpath": "airflow.providers.git.bundles.git.GitDagBundle", "kwargs": {"repo_url": "<https://github.com/your-org/private-dags>", "tracking_ref": "main", "subdir": "<dags-folder>", "git_conn_id": "git_default"}}]'
For secure production environments, store the connection string in a secret backend.
Configure Dag sources with LocalDagBundle for Remote Execution
LocalDagBundle
is the default Dag bundle type for dagBundleConfigList
.
By default, LocalDagBundle
looks for dags in the /dags
folder but you can configure it to look elsewhere with the path
argument in "kwargs": {}
.
For example:
dagBundleConfigList: '[{"name": "<dags-folder>", "classpath": "airflow.dag_processing.bundles.local.LocalDagBundle", "kwargs": {"path": "<path/to/dags-folder>"}}]'
GitDagBundle compared to LocalDagBundle
Both bundle types support tracking dag versions in the UI, however the GitDagBundle
provides further funtionality such as the ability to rerun specific versions of the Dag. The table below describes the functional differences:
Scenario | LocalDagBundle | GitDagBundle |
---|---|---|
Viewing previous dag runs in the UI | The dag graph and code tab displays the dag that existed at the time of the dag run. | The dag graph and code tab displays the dag that existed at the time of the dag run. |
Creating an entirely new dag run | Uses the current dag code. | Uses the current dag code. |
Rerunning a whole previous dag run | Uses the current dag code. | The scheduler uses the dag version that existed at the time of the dag run to determine which task instances to create. The workers use the code contained in the dag bundle version that existed at the time of the original dag run to run their tasks. |
Rerunning individual tasks of a previous dag run | Uses the latest version for tasks that are rerun. | Uses the code of the task contained in the dag bundle version at the time of the original dag run. |
Changing code while a dag is running | The dag always uses the current dag code at the time it starts a task, like in Airflow 2. | The dag run finishes using the bundle version it was started with. |
Running a backfill | Uses the current dag code. | Uses the latest bundle version. |
Making code changes | Every structural change to the dag creates a new dag version. | Every committed/saved structural change creates a new dag version. This means with every new bundle version, all dags that have had structural changes will also have a new dag version. |