For the past few years, we’ve been iterating on an Apache Airflow® and dbt™ integration with our customers and broader open-source community. Today, our integration is the most popular way to run dbt and Airflow together, downloaded more than 1.3m times per month. And now, we’re extending our commercial platform, Astro, to support dbt Core in an effort to solve the operational challenges of running Airflow and dbt together for our customers.
–
Airflow was created 10 years ago at Airbnb before being open-sourced. It’s designed to be an extremely flexible and reliable orchestration engine that comes without opinions. Airflow’s flexible foundation and its customizability to satisfy a variety of use cases has contributed to its wild success: Airflow is now downloaded ~30m times a month and is used to power data platforms everywhere, from leading-edge startups to the Fortune 5.
However, the data ecosystem has changed drastically over the last 10 years. We’ve seen the creation of the data engineering practice, the rise and fall of the modern data stack, and mass adoption of generative AI. These shifts in the ecosystem have naturally come with new, better tooling to support today’s data teams.
Unify Your Workflows with dbt on Astro Discover a seamless experience for deploying and managing dbt on Astro.
Take 30 Second Tour
The Rise of Airflow and dbt
dbt Labs, in particular, has done an excellent job of taking software engineering best practices and applying them to the world of data transformation. dbt Core, an open source project by dbt Labs, is the foundation of this and has grown to be quite popular: it’s downloaded 10m+ times a month and used by data teams around the world. Because the project’s focus is purely on data transformation and the analytics engineer, they’re able to design interfaces and functionality particularly suitable for those workloads. It comes opinionated and “batteries included” with native support for things like data lineage, testing, and documentation.
And it’s been great to see the project’s success! dbt Labs has made a huge investment in the open-source data ecosystem and we’ve all benefited from it. And because dbt is exclusively focused on the transformation layer, it’s common to run dbt as part of a broader data pipeline. In fact, about a third of Astronomer customers actively use dbt to run transformations, and we believe this is consistent with the broader open-source Airflow community.
Astronomer Cosmos Bridges the Observability Gap
dbt Core and Airflow have long been used together. Before last year, most data engineers would figure out their own way to instrument dbt and Airflow; most commonly, this meant running the dbt Core CLI in a BashOperator task to run an entire dbt project as a single Airflow task. We actually worked with a handful of customers to publish a series of blogs on how to run the two projects together.
There ended up being so much demand for a native integration that we turned our domain knowledge into a Python package, called Cosmos, that gave our community an extremely simple-yet-flexible way to run dbt projects in Airflow while maintaining full observability into the project.
Candidly, I first built this at an internal company hackathon as a nice thought experiment for what could be done. A handful of customers found the project, liked what they saw and started using it. We received so much positive feedback that we staffed a team last year to release a 1.0 version and continue iterating on the original vision.
At the start of 2024, Cosmos had around 200k downloads per month, and it’s exploded in popularity since. Cosmos is now the most popular method to run dbt Core and Airflow together, downloaded more than 1.3 million times per month (and growing)! This is more than the official dbt Cloud provider, which is downloaded around 1 million times per month.
Cosmos is meant to solve the integration problem, but not the operational challenges of running dbt and Airflow together. And those challenges aren’t easy to solve. Organizations typically have different teams working with dbt and Airflow, they maintain and test the code separately, and these dbt projects can grow to be quite large. Our customers (particularly the larger, multi-team ones) kept coming to us asking for advice on how to manage these challenges, and we never had great answers. It was clear there was an opportunity for us to play a larger role in supporting our customers running both Airflow and dbt, so we decided to do something about it.
Introducing the Next Chapter: Airflow and dbt on Astro
I’m excited to announce that we now support the ability to deploy dbt projects to Astro to be run natively in Airflow DAGs, in an effort to deliver best-in-class analytics engineering capabilities to our customers. We’re starting with a deploy-based feature because it’s what our customers and community have asked for the most. In fact, we asked 150 companies using Airflow (both open-source users and Astro customers) what their biggest challenge was and found that the number one issue was managing Airflow and dbt code in separate repos.
Astronomer is no stranger to supporting open source projects for our customers. We have a dedicated open-source engineering team who has contributed over half of the features in the Apache Airflow project, and we’re an active leader in the roadmap, releases, and community. And so naturally, with this release of a commercial toolset to orchestrate dbt projects comes a commitment to our community.
We plan to continue maintaining and building Cosmos, our open source, free to use integration between Airflow and dbt Core. At the same time, we’re going to continue investing in our commercial platform, Astro, to ensure it’s the best place to run Airflow and dbt Core to service the full needs of our customers.
If you’re a user of both Airflow and dbt, we expect dbt on Astro to give you a materially better experience than you’d find running these tools separately. Astro gives you a “single pane of glass” to understand what’s happening across your data ecosystem, now inclusive of dbt projects - this includes everything from model materialization to tests to dbt docs. It simplifies your orchestration stack into a single platform so your organization can deliver data reliably and consistently, across teams. And it can be significantly cheaper than running them separately because you can utilize the infrastructure you’re already running for Airflow. In most cases, running dbt on Astro is an order of magnitude cheaper than other alternatives because you only pay for the compute that you use.
I’m personally excited for this release because it represents a shift in our product strategy. We’re committed to building the best platform possible for our customers to orchestrate their * * data platforms. We’ve received great feedback from the one-third of our customers who are already running dbt with us, and we’re going to be paying attention to other areas we feel we can add value to our customer’s data platforms.
We’re ready for the community to try it out. Check out our docs and experience it yourself with a free trial (you’ll even get $300 in credits to get started)!