Airflow in Action: Data Engineering Insights from Circle, Managing $10s of Billions Across Multiple Blockchains
At the Airflow Summit, Nathaniel Rose, Staff Software Engineer at Circle shared the company’s experienceusing Apache Airflow® to manage the complexities of blockchain data orchestration. The session delved into Circle’s approach to scaling their data platform, implementing CI/CD practices, overcoming the challenges of a managed Airflow environment, and why they’re now exploring Astro, the enterprise-grade Airflow managed service from Astronomer.io.
Stablecoin, Blockchain, and Circle
Circle is a leader in digital financial technology, most notable for minting the USDC (USD Coin), a stablecoin backed by the U.S. dollar. With over $35 billion in circulating USDC, Circle leverages blockchain for near-instant settlement and global financial accessibility. This creates a data challenge of ingesting, indexing, and analyzing massive volumes of blockchain data from multiple protocols like Ethereum, Base, and Solana. On average, Circle runs 200,000 Airflow tasks every hour to support critical business insights, compliance, and auditing.
Figure 1: The data engineering ecosystem at Circle Image source.
The Data Platform Challenge
Circle’s rapid growth—from 20 to 70 data team members in two years—exacerbated their data orchestration complexities. Their existing setup on Managed Workflows for Apache Airflow (MWAA) presented several challenges, including scalability limits and frequent web server outages causing 500 errors. These outages disrupted development workflows, making it difficult for engineers to run backfills or test DAGs efficiently.
To address these issues, Circle proposed centralizing their deployment processes while implementing CI/CD practices. This included:
- Containerizing their Airflow environment to improve mobility across local, GitHub Actions, and managed service environments.
- Automating CI/CD workflows for DAG linting, static testing, and deployment to S3 buckets.
- Introducing a static testing framework to enable smoke tests, integration tests, and airflow variable consistency checks across dev, staging, and production environments.
Key Learnings Along the Way
Circle’s use of MWAA highlighted several limitations. For instance, MWAA’s restriction to the Celery Executor hindered scalability and prevented seamless integration of custom Docker images. They also faced challenges with maintaining shared development environments, which risked disruptions when engineers made changes to shared dependencies.
To mitigate these issues, Circle implemented:
- Enhanced testing pipelines for their curated operators, which account for 70% of their DAGs.
- Data quality frameworks to validate queries, catch errors early, and maintain data integrity.
- Strategies to manage MWAA’s web server outages, such as staggering DAG runs, increasing connection pool sizes, and upgrading Airflow to version 2.6.
Despite these improvements, Circle realized they were testing the limits of performance with MWAA and began to explore alternative managed Airflow solutions.
Why Astronomer
Astronomer’s support for the Kubernetes Executor in the Astro managed service offers Circle the flexibility to use containerized Airflow environments. This enables developers to spin up isolated ephemeral environments for testing DAG changes and tear them down after 24 hours, reducing costs and risks associated with shared environments.
Figure 2: Improving developer productivity and reducing costs, Circle has begun to evaluate the Astro managed service from Astronomer.io. Image source.
In addition, Astro also offers
- Seamless Local Development: Enhanced integration with the Astro CLIallows developers to test DAGs locally with the same connections used in production.
- Streamlined Environments: By reducing the need for multiple staging environments, Astronomer simplifies Circle’s architecture, improving efficiency and cost-effectiveness.
- Data observability with Astro Observe: enhanced data lineage, cross-deployment dependency graphs and SLA tracking for Circle’s Airflow pipelines
Next Steps
Circle’s journey highlights the power of Apache Airflow for orchestrating blockchain data workflows at scale. By addressing challenges in CI/CD, testing, and scalability, Circle has streamlined their operations while setting the stage for further innovation with Astronomer. You can get all of the details from listening to the replay of Nathaniel’s session Airflow Blockchain Use Case: Testing, GitOps and Learnings.
The best way to get started with Airflow is to build and run your pipelines on the Astro managed service. You can try Astro out for free here.