Skip to main content
Version: 0.37

Configure git-sync code deploys

You can deploy DAGs to an Astronomer Deployment using git-sync. After setting up this feature, you can deploy DAGs from a Git repository without any additional CI/CD. DAGs deployed with git-sync automatically appear in the Airflow UI without requiring additional action or causing downtime. You can also roll back images with the Software UI and Houston API.

note

Git-sync does not work with OpenShift clusters.

This guide provides details about setup options and the steps for configuring git-sync as a DAG deploy option.

Choose a git-sync strategy

You can choose how your implementation uses git-sync to optimize the speed of your code deploys and the frequency that your Deployment interacts with your GitHub Repo. The two main choices you have for your implementation are:

  • Repo fetch mode: How the git-sync relay retrieves changes from the configured GitHub repository.
  • Repo share mode: How the git-sync relay propagates changes within the Airflow Deployment, from the git-sync relay Pod in the namespace to DAG directories.

Repo fetch mode

In Poll Mode, the git-sync relay downloads changes from the remote GitHub repository at regular intervals. This strategy optimizes performance for git-sync configurations that connect to a remote repository with frequent changes, to any branches. However, this has a tradeoff where frequent changes in the repository that is frequently checked can cause large volumes of network traffic between Deployments and the repository.

You can instead choose to configure a Webhook instead of Poll mode, so that changes are fetched whenever activity in the GitHub repository occurs. This strategy optimizes performance for git-sync configurations that connect to a remote repository that does not have frequent activity, so that your Deployment does not perform unnecessary checks. If configured for a specific branch in a busy repository, the git-sync relay only downloads changes made to the branch, however, the webhook is still activated for every change in the GitHub repository, even if it only triggers downloads for changes made to the configured branch.

Repo Share Mode

The Repo Share Mode includes a choice between whether you want to transmit your DAGS over your network, or have them exist on a shared filesystem in a ReadWriteMany (RWX) volume.

If you use a git-daemon configuration:

  • This implementation relies on the git-daemon container serving the repo within the namespace by using the Git protocol on port 9418. However, it also means that all your Airflow containers must clone the repo at startup time, which can cause significant network use with large Git repos, along with startup delay.
  • Your Airflow Deployment contains a git-sync relay Pod, which contains both a git-sync container that stores the Git repo and git-daemon container that serves the local repo to the Airflow deployment namespace.

Alternatively, you can use the shared-volume configuration:

  • This implementation strategy eliminates git clone activity between pods in the namespace, and instead stores the git repository contents on an RWX storage volume that is mounted into each Airflow pod.
  • The architecture of your Airflow Deployment includes a git-sync relay Pod with a git-sync container, which pulls from the external Git repo. This Pod connects to an RWX volume, where the Git repo is stored.

Prerequisites

To enable the git-sync deploy feature, you need:

  • A Software installation running an OSS Airflow Chart (this is the default for most installations).
  • Permission to push new configuration changes to your Software installation.
  • (Shared volume mode) A ReadWriteMany (RWX) compatible StorageClass volume. Check your cloud provider's documentation for configuration steps.

To configure a git-sync deploy mechanism for a Deployment on Astronomer, you need Workspace Editor permissions.

To deploy DAGs to a Deployment using a git-sync deploy mechanism, you need permission to push code to a Git repository configured for git-sync deploys.

Enable git-sync

Git-sync deploys must be explicitly through the UI for each Airflow Deployment for both git-daemon and shared-volume modes.

However, for the shared-volume mode, an Astronomer Admin must configure the RWX shared volume storage class name, storageClassName, in the Houston configuration.

For example, update your values.yaml file with the following values, including the path to your RWX compatible storage:

astronomer:
houston:
config:
deployments:
configureDagDeployment: true
gitSyncDagDeployment: true
gitSyncRelay:
storageClassName: <your-RWX-storage>
repoShareMode: "shared-volume"

Configure your Astronomer Deployment

Workspace editors can configure a new or existing Airflow Deployment to use a git-sync mechanism for DAG deploys. From there, any member of your organization with write permissions to the Git repository can deploy DAGs to the Deployment. To configure a Deployment for git-sync deploys:

  1. In the Software UI, create a new Airflow Deployment or open an existing one.

  2. Go to the DAG Deployment section of the Deployment's Settings page.

  3. For your Mechanism, select Git Sync.

  4. Configure the following values:

    • Repository URL: The URL for the Git repository that hosts your Astro project
    • Branch Name: The name of the Git branch that you want to sync with your Deployment
    • Sync Interval: The time interval between checks for updates in your Git repository, in seconds. A sync is only performed when an update is detected. Astronomer recommends a minimum interval of 60 seconds.
    • DAGs Directory: The directory in your Git repository that hosts your DAGs. Specify the directory's path as relative to the repository's root directory. To use your root directory as your DAGs directory, specify this value as ./. Other changes outside the DAGs directory in your Git repository must be deployed using astro deploy
    • Rev: The commit reference of the branch that you want to sync with your Deployment
    • Ssh Key: The SSH private key for your Git repository
    • Known Hosts: The public key for your Git provider, which can be retrieved using ssh-keyscan -t rsa <provider-domain>. For an example of how to retrieve GitHub's public key, refer to Apache Airflow documentation.
    • Repo Fetch Mode: Choose Poll or WebHook. If you select WebHook, you need the Webhook URL and Webhook Secret Key for your GitHub Configuration.
    • Webhook URL: (Webhook mode only)
    • Webhook Secret Key: (Webhook mode only)
    • Ephemeral Storage Overwrite Gigabytes: The storage limit for your Git repository. If your Git repo is larger than 2GB, Astronomer recommends setting this slider to your repo size + 1 Gi
    • Sync Timeout: The maximum amount of seconds allowed for a sync. Astronomer recommends increasing this value if your repo is larger than 1GB
  5. (Webhook Only) You can now open your GitHub respository and set up a Repository Webhook, or you can return to your Deployment details page to configure this later. Be sure to set the following configurations:

  • Payload URL: Paste the Webhook URL from the Software UI
  • Content Type: Select JSON.
  • Secret: Paste the Webhook Secret Key from the Software UI
  • Enable SSL verification
  • Choose Just the push event for the event trigger
  1. Save your changes.

After you configure your Deployment, any code pushes to your DAG directory of the specified Git repo and branch will appear in your Deployment with zero downtime.

tip

Newly created DAG files can take up to five minutes (default configuration) from syncing to appear in the Airflow UI. To shorten this delay, we recommend tuning AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL in your Airflow deployment.

Configure a Git repo for git-sync deploys

The Git repo you want to sync should contain a directory of DAGs that you want to deploy to Astronomer. You can include additional files in the repo, such as your other Astro project files, but note that this might affect performance when deploying new changes to DAGs.

If you want to deploy DAGs with a private Git repo, you additionally need to configure SSH so that your Astronomer Deployment can access the contents of the repo. This process varies slightly between Git repository management tools. For an example of this configuration, read GitLab's SSH Key documentation.

Was this page helpful?