Airflow in Action: Data Engineering Insights from Processing PBs of Data Every Day at Stripe

  • M

In 2023 Stripe processed $1 trillion of payments, equivalent to 1% of global GDP. It’s not surprising then to hear Stripe engineers describe what they build as the “payment infrastructure of the internet”.

As the Stripe Data Orchestration team explained at the Airflow Summit 2024, regulatory compliance in financial services is non-negotiable, and ensuring the integrity of production data is crucial to meet that goal. At the same time, the data and software engineering teams need to enable developers to move fast in building and testing new services. To balance these two demands, Stripe has developed User Scope Mode (USM), a powerful internal tool that allows users to safely and efficiently test new or existing Apache Airflow® data pipelines without the risk of corrupting production data.

In this blog post, we’ll recap key highlights from Stripe’s session at the Summit and provide you further resources to learn more.

Airflow at Stripe by the numbers

Stripe operates Airflow at a huge scale, processing multiple petabytes of data every day. Airflow orchestrates 250 complex pipelines comprising 150,000 tasks that connect Stripe’s operational and analytical systems. It manages workflows across Apache Spark with data transformed into the Apache Iceberg table format and queried via Presto for 500 different teams in the company.

Stripe has maintained its own Airflow fork (of Airflow 1.10), but is now in the process of moving to Airflow 2.8. Aligning with the upstream mainline project will reduce internal engineering efforts and enable them to take advantage of newer Airflow features faster.

Stripe’s path to Airflow development

Running mission-critical workloads at massive scale complicates the development of new workflows that need to integrate with the existing production estate. To address this concern, Stripe’s User Scope Mode enables internal teams to rapidly iterate and improve their pipelines without the complexity of manual configuration or the risk of impacting production operations or data integrity. USM enables Stripe’s engineers to run an Airflow job and compare the results to the most recent production job, and automatically reconfigure Airflow pipelines so that engineers can effortlessly develop and test their production-ready workflows in a local environment.

Stripe’s Data Change Verifier compares outputs of test to production DAGs. It is able to check for schema changes, differences in computational aggregates in the data, and more.

Figure 1: Stripe’s Data Change Verifier compares outputs of test to production DAGs. It is able to check for schema changes, differences in computational aggregates in the data, and more. Image source

In their session at the Airflow Summit, the Stripe engineers lift the covers on USM internals and show how it has revolutionized their development and testing workflows. They explain how USM integrates with Airflow, enabling users to efficiently validate their pipelines with speed and confidence, all while maintaining strict compliance requirements, permissioning, and data integrity.

USM empowers Stripe’s teams to iterate and refine their workflows without the burden of manual setup or the fear of disrupting live operations

Figure 2: USM empowers Stripe’s teams to iterate and refine their workflows without the burden of manual setup or the fear of disrupting live operations Image source

Next steps

While USM has many benefits, there are several downsides. It is a very manual process for engineers to follow; separately tested pull requests may conflict when merged and deployed together; and it is difficult to test long running pipelines or large numbers of tasks.Moving to the Airflow 2.x release train will help address many of these challenges by allowing USM to run as a decoupled system supporting independent deploys, each with its own DAG processor. Beyond Stripe’s talk, it is worth noting that DAG versioning scheduled for delivery in Airflow 3.0 addresses many of the use-cases that USM serves today.

To learn more about the evolution of Airflow at Stripe, watch the session replay Stress-Free Airflow development: From Dev to Prod at Stripe.

Designed to boost developer velocity for your data workflows, Astro’s suite of developer tools allows any engineer to get their pipelines into production faster. This includes Astro’s CI/CD workflows and templates that enable engineers to deploy code securely and reliably whether they have a single environment or multiple environments. This can help reduce and even eliminate custom tooling such as USM. You can get started by evaluating Astro for free today.

Build, run, & observe your data workflows.
All in one place.

Get $300 in free credits during your 14-day trial.

Get Started Free