Blog |

Airflow and dbt: the next chapter

7 min read |

For the past few years, we’ve been iterating on an Apache Airflow® and dbt™
integration with our customers and broader open-source community. Today,
our integration is the most popular way to run dbt and Airflow together,
downloaded more than 1.3m times per month. And now, we’re extending our
commercial platform, Astro, to support dbt Core in an effort to solve the
operational challenges of running Airflow and dbt together for our
customers.

Airflow was created 10 years ago at Airbnb before being open-sourced. It’s
designed to be an extremely flexible and reliable orchestration engine
that comes without opinions. Airflow’s flexible foundation and its
customizability to satisfy a variety of use cases has contributed to its
wild success: Airflow is now downloaded ~30m times a month and is used to
power data platforms everywhere, from leading-edge startups to the Fortune
5.

However, the data ecosystem has changed drastically over the last 10
years. We’ve seen the creation of the data engineering
practice
,
the rise and fall of the modern data
stack
, and
mass adoption of generative
AI
.
These shifts in the ecosystem have naturally come with new, better tooling
to support today’s data teams.

The Rise of Airflow and dbt

dbt Labs, in particular, has done an excellent job of taking software
engineering best practices and applying them to the world of data
transformation. dbt Core, an open
source project by dbt Labs, is the foundation of this and has grown to be
quite popular: it’s downloaded 10m+ times a month and used by data teams
around the world. Because the project’s focus is purely on data
transformation and the analytics engineer, they’re able to design
interfaces and functionality particularly suitable for those workloads. It
comes opinionated and “batteries included” with native support for things
like data lineage, testing, and documentation.

And it’s been great to see the project’s success! dbt Labs has made a huge
investment in the open-source data ecosystem and we’ve all benefited from
it. And because dbt is exclusively focused on the transformation layer,
it’s common to run dbt as part of a broader data pipeline. In fact, about
a third of Astronomer customers actively use dbt to run transformations,
and we believe this is consistent with the broader open-source Airflow
community.

Astronomer Cosmos Bridges the Observability Gap

dbt Core and Airflow have long been used together. Before last year, most
data engineers would figure out their own way to instrument dbt and
Airflow; most commonly, this meant running the dbt Core CLI in a
BashOperator task to run an entire dbt project as a single Airflow task.
We actually worked with a handful of customers to publish a
series
of
blogs on how to run the
two projects together.

There ended up being so much demand for a native integration that we
turned our domain knowledge into a Python package, called
Cosmos, that gave our
community an extremely simple-yet-flexible way to run dbt projects in
Airflow while maintaining full observability into the project.

Candidly, I first built this at an internal company hackathon as a nice
thought experiment for what could be done. A handful of customers found
the project, liked what they saw and started using it. We received so much
positive feedback that we staffed a team last year to release a 1.0
version and continue iterating on the original vision.

At the start of 2024, Cosmos had around 200k downloads per month, and it’s
exploded in popularity since. Cosmos is now the most popular method to run
dbt Core and Airflow together, downloaded more than 1.3 million times per
month (and growing)! This is more than the official dbt Cloud provider,
which is downloaded around 1 million times per month.

Cosmos is meant to solve the integration problem, but not the
operational challenges of running dbt and Airflow together. And those
challenges aren’t easy to solve. Organizations typically have different
teams working with dbt and Airflow, they maintain and test the code separately,
and these dbt projects can grow to be quite large. Our customers (particularly
the larger, multi-team ones) kept coming to us asking for advice on how to
manage these challenges, and we never had great answers. It was clear there was
an opportunity for us to play a larger role in supporting our customers running
both Airflow and dbt, so we decided to do something about it.

Introducing the Next Chapter: Airflow and dbt on Astro


I’m excited to announce that we now support the ability to deploy dbt
projects to
Astro

to be run natively in Airflow DAGs, in an effort to deliver best-in-class
analytics engineering capabilities to our customers. We’re starting with a
deploy-based feature because it’s what our customers and community have
asked for the most. In fact, we asked 150 companies using Airflow (both
open-source users and Astro customers) what their biggest challenge was
and found that the number one issue was managing Airflow and dbt code in
separate repos.

Astronomer is no stranger to supporting open source projects for our
customers. We have a dedicated open-source engineering team who has
contributed over half of the features in the Apache Airflow project, and
we’re an active leader in the roadmap, releases, and community. And so
naturally, with this release of a commercial toolset to orchestrate dbt
projects comes a commitment to our community.

We plan to continue maintaining and building Cosmos, our open source, free
to use integration between Airflow and dbt Core. At the same time, we’re
going to continue investing in our commercial platform, Astro, to ensure
it’s the best place to run Airflow and dbt Core to service the full needs
of our customers.

If you’re a user of both Airflow and dbt, we expect dbt on Astro to give
you a materially better experience than you’d find running these tools
separately. Astro gives you a “single pane of glass” to understand what’s
happening across your data ecosystem, now inclusive of dbt projects - this
includes everything from model materialization to tests to dbt docs. It
simplifies your orchestration stack into a single platform so your
organization can deliver data reliably and consistently, across teams. And
it can be significantly cheaper than running them separately because you
can utilize the infrastructure you’re already running for Airflow. In most cases,
running dbt on Astro is an order of magnitude cheaper than other alternatives
because you only pay for the compute that you use.

I’m personally excited for this release because it represents a shift in
our product strategy. We’re committed to building the best platform
possible for our customers to orchestrate their * * data platforms. We’ve
received great feedback from the one-third of our customers who are
already running dbt with us, and we’re going to be paying attention to
other areas we feel we can add value to our customer’s data platforms.

We’re ready for the community to try it out. Check out our
docs and
experience it yourself with a free
trial

(you’ll even get up to $20 in credits to get started)!

Build, run, & observe your data workflows.
All in one place.

Build, run, & observe
your data workflows.
All in one place.

Try Astro today and get up to $20 in free credits during your 14-day trial.