Version: Airflow 3.x

Apache Airflow® Quickstart - Learn Airflow

Learning Airflow: An introduction to Airflow's lean and dynamic pipelines-as-Python-code.

Step 1: Clone the Astronomer Quickstart repository

Create a new directory for your project and open it:

mkdir airflow-quickstart-learning && cd airflow-quickstart-learning

Clone the repository and open it:

git clone -b learning-airflow-3 --single-branch https://github.com/astronomer/airflow-quickstart.git && cd airflow-quickstart/learning-airflow

Your directory should have the following structure:

.
├── Dockerfile
├── README.md
├── dags
│	└── example_astronauts.py
│	└── example_extract_astronauts.py
├── include
├── packages.txt
├── requirements.txt
├── solutions
│	└── example_astronauts_solution.py
└── tests
    └── dags
		└── test_dag_integrity.py

Step 2: Start up Airflow and explore the UI

Start the project using the Astro CLI:
```
astro dev start
```
The CLI will let you know when all Airflow services are up and running.

tip

At this time, Safari will not work properly with the UI. If Safari is your default browser, use Chrome to open Airflow 3.0.

If it doesn't launch automtically, navigate your browser to localhost:8080 and sign in to the Airflow UI using username admin and password admin.
Explore the Home screen and DAGs page to get a sense of the metadata available about the DAG, run, and all task instances. For a deep-dive into the UI's features, see An introduction to the Airflow UI.

For example, the Home screen will look like this screenshot:

And the DAGs page will look like this screenshot:

As you start to trigger DAG runs on the example_astronauts dag, the DAG view will look like this screenshot:

You can clearly see information such as the Schedule, Last Run (and status), Next Run, and a visual of the recent runs.
Once you have triggered a few runs of the example_astronauts DAG, you should notice that it has also triggered runs of the example_extract_astronauts DAG. If you go into the Assets screen, you'll be able to see the current_astronauts Asset which has 1 consuming DAG and 1 producing Task.

Step 3: Explore the project

This Astro project introduces you to the basics of orchestrating pipelines with Airflow. You'll see how easy it is to:

Get data from data sources.
Generate tasks automatically and in parallel.
Trigger downstream workflows automatically.

You'll have a lean, dynamic pipeline serving a common use case: extracting data from an API and loading it into a database!

warning

This project uses DuckDB, an in-memory database. Although this type of database is great for learning Airflow, your data is not guaranteed to persist between executions!

For production applications, use a persistent database instead (consider DuckDB's hosted option MotherDuck or another database like Postgres, MySQL, or Snowflake).

Pipeline structure

An Airflow instance can have any number of DAGs (directed acyclic graphs), your data pipelines in Airflow. This project has two:

`example_astronauts`

This DAG queries the list of astronauts currently in space from the Open Notify API, prints assorted data about the astronauts, and loads data into an in-memory database.

Tasks in the DAG are Python functions decorated using Airflow's TaskFlow API, which makes it easy to turn arbitrary Python code into Airflow tasks, automatically infer dependencies, and pass data between tasks.

get_astronaut_names and get_astronaut_numbers make a JSON array and an integer available, respectively, to downstream tasks in the DAG.
print_astronaut_craft and print_astronauts make use of this data in different ways. The third task uses dynamic task mapping to create a parallel task for each Astronaut in the list retrieved from the API. Airflow lets you do this with just two lines of code:
```
print_astronaut_craft.partial(greeting="Hello! :)").expand(
    person_in_space=get_astronaut_names()
),
```
The key feature is the expand() function, which makes the DAG automatically adjust the number of tasks each time it runs.
create_astronauts_table in duckdb and load_astronauts_in_duckdb create a DuckDB database table for some of the data and load the data, respectively.

`example_extract_astronauts`

This DAG queries the database you created for astronaut data in example_astronauts and prints out some of this data. Changing a single line of code in this DAG can make it run automatically when the other DAG completes a run.

DAG Dependencies

Airflow makes it easy to to create cross-workflow dependencies. Assets are a collection of logically related data that you define with a Python function in Airflow reducing the code required to create cross-DAG dependencies. For example, with an import and a single line of code, you can schedule a DAG to run when another DAG in the same Airflow environment has updated an Asset.

The example_astronauts DAG creates the Asset with the needed data that example_extract_astronauts uses as a run trigger, instead of a standard schedule. The lines of code in the example_astronauts DAG to create the Asset for the trigger is:

from airflow.sdk import Asset

And

@task(
    outlets=[Asset(_DUCKDB_TABLE_NAME)]
)
def get_astronaut_names(**context) -> list[dict]:

The line of code used to create that schedule trigger inside the example_extract_astronauts DAG is:

schedule=[Asset(_DUCKDB_TABLE_NAME)]

Next Steps:

Run Airflow on Astro

The easiest way to run Airflow in production is with Astro. To get started, create an Astro trial. During your trial signup, you will have the option of choosing the same template project you worked with in this quickstart.

Apache Airflow® Quickstart - Learn Airflow

Step 1: Clone the Astronomer Quickstart repository

Step 2: Start up Airflow and explore the UI

Step 3: Explore the project

Pipeline structure

`example_astronauts`

`example_extract_astronauts`

DAG Dependencies

Next Steps:

Run Airflow on Astro

Further Reading

Was this page helpful?

Step 1: Clone the Astronomer Quickstart repository​

Step 2: Start up Airflow and explore the UI​

Step 3: Explore the project​

Pipeline structure​

example_astronauts​

example_extract_astronauts​

DAG Dependencies​

Next Steps:​

Run Airflow on Astro​

Further Reading​

Was this page helpful?

Step 1: Clone the Astronomer Quickstart repository

Step 2: Start up Airflow and explore the UI

Step 3: Explore the project

Pipeline structure

`example_astronauts`

`example_extract_astronauts`

DAG Dependencies

Next Steps:

Run Airflow on Astro

Further Reading