The Astro Cloud IDE: from Python and SQL to nearly 1,000 Airflow operators

  • Julian LaNeve

Astronomer released the Cloud IDE in December 2022 with support for Python and SQL cell types. This was a great starting point: we wanted to make it easy for data scientists and analysts to have access to workflow management without feeling like they were learning a new tool. But as we got the Cloud IDE in the hands of our customers, we realized that it could be so much more than a traditional Python and SQL notebook. The process of iteratively writing and testing Airflow tasks in the form of notebook cells was appealing to a wide range of developers, including data engineers. And data engineers demanded more functionality; they understood how powerful Airflow can be, and wanted to couple that with the iterative development style provided by the Cloud IDE.

So today, we couldn’t be more excited to announce that we’ve added almost 1,000 different cell types to the Cloud IDE. Almost every open source Airflow operator is now available to anyone with access to Astro, whether you’re a long-time Astronomer customer or just starting your free trial (speaking of which - if you haven’t heard, we’ve just added a 14-day free trial; go check it out!).

In order to expose open source operators in the Cloud IDE, we developed a robust and extensible cell type framework. Not only does this support the open source operators, it also gives users a way to define their own Cloud IDE cell types based on their private Airflow operators with no code changes necessary. Keep reading below to learn more about this new functionality!

Leverage the full Airflow ecosystem in your Cloud IDE DAGs

Part of the reason Airflow became (and continues to be) so popular is its ecosystem of pre-built operators. If you’re working with common data tools, chances are there’s an operator that already exists with functionality to support your use case. But using these operators can be difficult - especially because there are so many to keep track of. Using an operator in your DAG involves importing the Python module and instantiating it with certain parameters. But how do you know where to import it from? And once you have it imported, how do you know which fields you’re required to fill out? More importantly, how do you know what to put in each field without pulling up the documentation to understand what the fields do?

Importing and documentation aside, using these Airflow operators really entails filling out a set of fields with values specific to your use case. And that’s exactly what the Cloud IDE makes extremely simple. To understand how it works, let’s work through a scenario: triggering a Databricks notebook from Airflow.

First, click on the Add Cell menu in the top left of your notebook. You’ll see a full, searchable list of the cell types available to you. Find the cell type you want to use and click it to add it to your notebook. It’s as simple as that!

add-cell.gif

When you add a new cell, the Cloud IDE automatically checks if you need any additional Python packages required for the underlying operator. If so, the IDE automatically adds it for you. In this case, the package apache-airflow-provider-databricks appears in our requirements section after we add the cell.

Your cell in the Cloud IDE contains a form field for each parameter in the underlying Airflow operator. Enter the necessary fields and click the run button in the upper right corner of the cell to test it! In the first run attempt, we get an error message indicating that we’re missing a required field. After filling out the missing field, we can immediately run the cell again. No more switching back and forth between your code and the Airflow UI!

fix-failed-run.gif

We just went from 0 to successfully triggering a Databricks notebook in just a few minutes without having to leave the IDE to test tasks or find documentation. Running a Databricks notebook is just one of almost 1,000 cell types now available - you can try the rest out for yourself today with a free trial on Astro.

Exposing custom operators to your organization in the Cloud IDE

While there’s an extensive list of open-source Airflow operators that are quite flexible, many organizations build their own custom Airflow operators to interact with custom data sources, abstract authentication, and set common configuration, among other things. These custom operators are then consumed by Airflow DAG authors as they create and update DAGs. Getting the custom operators in the hands of a data engineer who understands how to work with Airflow has never been an issue, but what if you want to expose that functionality to non-data engineers?

The Cloud IDE now provides functionality to create and use custom cell types based on custom Airflow operators. Custom cell type definitions are very flexible, but at a minimum you need to supply: A name for the cell type An import path indicating where to find the custom Airflow operator A list of parameters to prompt the user for

In addition to those fields, there are many optional fields to customize the behavior of the cell types. For more information on what you can customize, check out the custom cell type reference documentation.

Let’s take a look at how Astronomer’s data team uses custom cell types in the Cloud IDE to provide easy access to internally-maintained custom operators. Our data team has created a custom operator, AbsqlOperator, that provides a more opinionated way of running SQL. In addition to specifying the query itself, a user can specify a schema, parameterized replacements, and materialization methods. Exposing the operator in the Cloud IDE is very straightforward: a member of the data platform team created the cell type and let the team know when it was ready.

custom-cell-type.webp

Once the cell type was defined, members of the data science and analytics team started using this cell in their pipelines. For an end user, the experience is seamless - they click on the cell type in the Add Cell menu, fill out the fields they’ve been prompted for, click run, and they’re good to go!

cct-cell.webp

With this functionality now in the Cloud IDE, our data team is able to expose functionality via custom Airflow operators in such a way that anyone at Astronomer can use them, not just engineers familiar with Airflow.

The new cell types make the Cloud IDE significantly more powerful than before, and we’re excited to see what you use them for. If you have any questions, let us know! Otherwise, check it out for yourself – for free – on Astro today.

Ready to Get Started?

Get Started Free

Try Astro free for 14 days and power your next big data project.