Using Airflow with Databricks is common in the data ecosystem. Many data teams leverage Databricks to run heavy workloads like training machine learning models, data transformations, and data analysis. Using Airflow as a tool-agnostic orchestrator in combination with Databricks provides several advantages, such as easy integration with other tools in your data stack, and managing your pipelines as code.
Airflow’s open-source functionality makes it easy to orchestrate Databricks Jobs, allowing you to take advantage of Databricks’ Job clusters while maintaining full visibility from Airflow. In this webinar, we show how to use Airflow to orchestrate your Databricks Jobs by using the Airflow Databricks Provider. You can find the code shown in the demo in this repo.