Airflow in Action: Making dbt on Airflow Easy with Astronomer Cosmos. Insights from BAM

  • M

At the Airflow Summit, Lewis Macdonald (Engineering Manager) and Ethan Stone (Software Engineering) at Balyasny Asset Management (BAM) shared how they streamline data transformations with dbt on Apache Airflow using Astronomer Cosmos.

The session covered BAM’s architecture, challenges with scaling dbt projects across multiple teams, and their innovative solution that empowers diverse user personas to run dbt workflows effortlessly. The talk concluded with a demo showcasing the power of their integration and a look toward the future with Apache Airflow® 3.

BAM’s Data Landscape and Challenges

BAM, a global multi-strategy investment firm, manages over $20 billion in assets with a team of over 400 technologists, analysts, and data scientists. Their data ecosystem ingests thousands of sources—from web scrapes to market data—and processes them through complex, layered transformations.

Figure 1: BAM’s data processing architecture with Airflow orchestrating workflows running on the Astro managed service. Image source.

Initially,BAM’s dbt workflows faced significant hurdles:

  • Complexity and Transparency: Thousands of interdependent data pipelines created a tangled web of dependencies, making it difficult to debug or optimize workflows.
  • Team Collaboration: Different teams managing different transformation stages struggled with handoffs and inconsistent tooling.
  • Development Velocity: It was difficult for teams to test pipelines, slowing iteration speed and increasingly risk.

Lewis and Ethan wanted to make it easy for their team to use dbt Core in production - simplifying oboarding; minimizing the config and setup burden; build on a foundation of strong multi-tenant security; autoscale deployments with Kubernetes; and provide deep observability into the health of their dbt deployments.

The Architecture Behind BAM’s dbt Solution

To address these challenges, BAM built a self-service dbt platform integrating their internal developer platform with Astronomer Cosmos. Cosmos is an open source solution developed by Astronomer that allows data engineers to run dbt Core projects as Airflow DAGs and Task Groups with just a few lines of code.

BAM used Cosmos (which is an integral part of Astro, the fully managed Airflow service from Astronomer), and simplified their architecture in the following ways:

  1. Project Bootstrapping: Teams start with a pre-configured dbt repository provisioned via GitHub, complete with Airflow integration.
  2. Build and Deployment Pipeline: dbt projects are compiled into Docker containers and manifest files, stored in a centralized registry for production use.
  3. Cosmos Integration: A wrapper around Cosmos translates dbt manifests into Airflow task groups, with each dbt model running as a Kubernetes pod.

Figure 2: dbt workflow from local development to production deployment with task execution on Kubernetes pods, orchestrated by Airflow running in Astro. Image source.

The session featured a demo that showcased the end-to-end process of transforming local dbt projects into pipelines running on Astro. The demo illustrated how BAM’s setup allows users to define dbt models, deploy them, and run transformations with full lineage and observability.

The Role of Astronomer Cosmos

Cosmos plays a pivotal role in simplifying dbt execution in Airflow:

  • Translation Layer: Cosmos compiles the dbt manifest to an Airflow DAG.
  • Dynamic Manifest Caching: The manifest is retrieved and cached, rather than embedded into the Airflow DAG. This allows dbt projects to be decoupled from Airflow deployments.
  • Kubernetes Execution: BAM’s Kubernetes-first approach ensures each dbt model becomes an individual Airflow task in a K8s pod. Credentials are injected dynamically via HashiCorp Vault, aligning with BAM’s stringent compliance standards.

Figure 3: The Cosmos library translates a dbt model into an Airflow DAG. Image source.

Cosmos Benefits and Future Plans

By harnessing Cosmos on Astro, BAM’s solution delivers significant advantages:

  • Ease of Use: Users with minimal Airflow experience can run dbt workflows without writing DAGs, instead focusing solely on SQL transformations.
  • Enhanced Observability: Detailed lineage, metrics, and logs provide transparency and improve production reliability.
  • Standardization: A centralized documentation hub and consistent practices across teams promotes maintainability and productivity.

Looking ahead, BAM is excited about Airflow 3’s decoupling of scheduler and runtime execution, which aligns well with their Kubernetes strategy. They are also collaborating with the Cosmos team at Astronomer to explore further optimizations for large-scale execution.

Next Steps

Curious to learn more from BAM? Watch the Airflow Summit session replay Building on Cosmos: Making dbt on Airflow Easy.

Running dbt with Cosmos gives you flexibility. You can use Cosmos with Apache Airflow, and optionally run it on the Astro managed service. Running on Astro enables you to take advantage of additional functionality for your dbt workflow such as collecting data lineage, deeper observability across your pipelines, and task failure management.

You can learn more about Cosmos and other dbt integrations on our dbt and Airflow page, where you can also sign up for a free trial on the Astro service.

Build, run, & observe your data workflows.
All in one place.

Get $300 in free credits during your 14-day trial.