Use a MySQL or PostgreSQL database for metadata or storage

You can create Astronomer Software Deployments with the Houston API that use pre-created databases, external to the Airflow Deployment, as both a metadata storage and result storage backend.

Prerequisites

  • Workspace Admin user privileges and a Workspace ID
  • (Optional) A MySQL or PostgreSQL database
  • (Optional) An existing Deployment
If you create a new connection to an external database from a Deployment with existing DAG data, you must migrate that historic data to the new database. Information about your historic Deployment activity, such as task instances and DAG runs, won’t be displayed as the database where you stored that information has changed.

Step 1: Add configuration

  1. Open your values.yaml file.
  2. Add the following to your values.yaml file. These configurations enable the use of manual connection strings and provide an example of the database connection string format, which you can later update for your specific connection.
1deployments:
2 manualConnectionStrings:
3 enabled: true
4 database:
5 connection: postgresql://example_user:example_pass@host:5432/example_db
  1. Push the configuration change. See See Apply a Config Change.

Step 2: (Optional) Create your database

Substitute astro-db-name with your own database name, if you need to create a new database.

1CREATE DATABASE astro-db-name;

Step 3: Add a user account to your database for the connection

Substitute astro-user-name and astro-user-password with your information. You can use an existing database for this step.

  1. Create a user with a password for Astronomer Software to use to access the database.
1CREATE USER astro-user-name WITH PASSWORD 'astro-user-password';
  1. Grant all privileges on the database to the user.
1GRANT ALL PRIVILEGES ON DATABASE postgreSQL_linked_DB TO astro-user-name;
  1. Grant USAGE and CREATE privileges on the public schema to astro-user-name:
1GRANT USAGE, CREATE ON SCHEMA public TO astro-user-name;

Now, go into the database you created, which is astro-db-name in this example, and run the following queries

  1. Grant all privileges on all tables, sequences, and functions to the user.
1GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO astro-user-name;
2GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO astro-user-name;
3GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public TO astro-user-name;
  1. Set default privileges for the user, so any new tables, sequences, or functions automatically have the user’s access.
1ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON TABLES TO astro-user-name;
2ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON SEQUENCES TO astro-user-name;
3ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL PRIVILEGES ON FUNCTIONS TO astro-user-name;
4GRANT USAGE, CREATE ON SCHEMA public TO astro-user-name;

Step 4: Retrieve database host information

Retrieve the connection information for your external database. For example, with AWS, you can retrieve your endpoint information by Finding the connection information for an RDS for MySQL DB instance.

Step 5: Compose a connection strings for your database

You need connection strings that define how Astronomer Software configures the connection to your external databases from your Airflow Deployment. The values of these strings are used when you define your metadataConnection or resultBackendConnection when you create, update, or upsert your Deployment.

Use the values for your astro-user-name, astro-user-password, astro-db-name, and the host information you retrieved to compose the connection strings in the following format, depending on whether you want to define a result backend connection or a metadata database connection.

With PGBouncer disabled

  • metadataConnection:

    postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name
  • resultBackendConnection:

    db+postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name

:::warning Celery Executor

The connection string format validation regex don’t cover the resultbackend connection string format, which includes db+. This is specifically required for the Celery executor worker. If the connection string doesn’t include db+, then Celery worker pod fails. The regex validation is not implemented because it adds the complications on format validation logic in different scenarios.

:::

With PGBouncer enabled

If you have PGBouncer enabled, and are using Postgres, you must configure metadataConnectionJson and resultBackendConnectionJson instead.

Use the values for your astro-user-name, astro-user-password, astro-db-name, and the host information you retrieved to compose the connection strings in the following format, depending on whether you want to define a result backend connection or a metadata database connection.

  • metadataConnectionJson:

    1"metadataConnectionJson": {
    2 "user": "astro-user-name",
    3 "pass": "astro-user-password",
    4 "protocol": "postgresql",
    5 "host": "host",
    6 "port": 5432,
    7 "db": "astro-db-name"
    8 },
  • resultBackendConnectionJson:

    1"resultBackendConnectionJson": {
    2 "user": "astro-user-name",
    3 "pass": "astro-user-password",
    4 "protocol": "postgresql",
    5 "host": "host",
    6 "port": 5432,
    7 "db": "astro-db-name"
    8 },

Step 6: Add to Deployment configuration

Use the Houston API to create your Deployment configuration.

The following example shows the mutation and queries for using createDeployment. See Houston API code examples for examples on how to use the update and upsert options for configuring your Deployment.

Create a new Deployment

1mutation createDeployment(
2 $workspaceUuid: Uuid!
3 $releaseName: String
4 $namespace: String!
5 $type: String!
6 $label: String!
7 $description: String
8 $version: String
9 $airflowVersion: String
10 $runtimeVersion: String
11 $executor: ExecutorType
12 $workers: Workers
13 $webserver: Webserver
14 $scheduler: Scheduler
15 $triggerer: Triggerer
16 $config: JSON
17 $properties: JSON
18 $dagDeployment: DagDeployment
19 $astroUnitsEnabled: Boolean
20 $rollbackEnabled: Boolean
21 $metadataConnection: String
22 $resultBackendConnection: String
23 $metadataConnectionJson: JSON
24 $resultBackendConnectionJson: JSON
25) {
26 createDeployment(
27 workspaceUuid: $workspaceUuid
28 releaseName: $releaseName
29 namespace: $namespace
30 type: $type
31 label: $label
32 airflowVersion: $airflowVersion
33 description: $description
34 version: $version
35 executor: $executor
36 workers: $workers
37 webserver: $webserver
38 scheduler: $scheduler
39 triggerer: $triggerer
40 config: $config
41 properties: $properties
42 runtimeVersion: $runtimeVersion
43 dagDeployment: $dagDeployment
44 astroUnitsEnabled: $astroUnitsEnabled
45 rollbackEnabled: $rollbackEnabled
46 metadataConnection: $metadataConnection
47 resultBackendConnection: $resultBackendConnection
48 metadataConnectionJson: $metadataConnectionJson
49 resultBackendConnectionJson: $resultBackendConnectionJson
50 ) {
51 id
52 config
53 urls {
54 type
55 url
56 __typename
57 }
58 properties
59 description
60 label
61 releaseName
62 namespace
63 status
64 type
65 version
66 workspace {
67 id
68 label
69 __typename
70 }
71 airflowVersion
72 runtimeVersion
73 desiredAirflowVersion
74 dagDeployment {
75 type
76 nfsLocation
77 repositoryUrl
78 branchName
79 syncInterval
80 syncTimeout
81 ephemeralStorage
82 dagDirectoryLocation
83 rev
84 sshKey
85 knownHosts
86 __typename
87 }
88 createdAt
89 updatedAt
90 __typename
91 }
92}

JSON Query example

1{
2 "workspaceUuid": "cm3g0cjd2000008l74jigb54y",
3 "metadataConnectionJson": {
4 "user": "astro-user-name",
5 "pass": "astro-password",
6 "protocol": "postgresql",
7 "host": "host",
8 "port": 5432,
9 "db": "astro-db-name"
10 },
11 "resultBackendConnectionJson": {
12 "user": "astro-user-name",
13 "pass": "astro-password",
14 "protocol": "postgresql",
15 "host": "postgres-db-lb.external-postgres.svc.cluster.local",
16 "port": 5432,
17 "db": "astro-db-name"
18 },
19 "namespace": "",
20 "type": "airflow",
21 "config": {
22 "executor": "CeleryExecutor",
23 "workers": {},
24 "webserver": {},
25 "scheduler": {
26 "replicas": 1
27 },
28 "triggerer": {}
29 },
30 "executor": "CeleryExecutor",
31 "workers": {},
32 "webserver": {},
33 "scheduler": {
34 "replicas": 1
35 },
36 "triggerer": {},
37 "label": "Rt1160-Celery-Pgbouncer-Enabled-Json-5",
38 "description": "",
39 "runtimeVersion": "11.6.0",
40 "properties": {
41 "extra_capacity": {
42 "cpu": 1000,
43 "memory": 3840
44 }
45 },
46 "astroUnitsEnabled": false,
47 "rollbackEnabled": true,
48 "dagDeployment": {
49 "type": "dag_deploy",
50 "nfsLocation": "",
51 "repositoryUrl": "",
52 "branchName": "",
53 "syncInterval": 1,
54 "syncTimeout": 120,
55 "ephemeralStorage": 2,
56 "dagDirectoryLocation": "",
57 "rev": "",
58 "sshKey": "",
59 "knownHosts": ""
60 }
61}

Example query string variables

1{
2 "workspaceUuid": "cm3g0cjd2000008l74jigb54y",
3"metadataConnection": "postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name"
4"resultBackendConnection": "db+postgresql://astro-user-name:astro-user-password@host:5432/astro-db-name"
5 "namespace": "",
6 "type": "airflow",
7 "config": {
8 "executor": "CeleryExecutor",
9 "workers": {},
10 "webserver": {},
11 "scheduler": {
12 "replicas": 1
13 },
14 "triggerer": {}
15 },
16 "executor": "CeleryExecutor",
17 "workers": {},
18 "webserver": {},
19 "scheduler": {
20 "replicas": 1
21 },
22 "triggerer": {},
23 "label": "Rt1160-Celery-Pgbouncer-Enabled-Json-5",
24 "description": "",
25 "runtimeVersion": "11.6.0",
26 "properties": {
27 "extra_capacity": {
28 "cpu": 1000,
29 "memory": 3840
30 }
31 },
32 "astroUnitsEnabled": false,
33 "rollbackEnabled": true,
34 "dagDeployment": {
35 "type": "dag_deploy",
36 "nfsLocation": "",
37 "repositoryUrl": "",
38 "branchName": "",
39 "syncInterval": 1,
40 "syncTimeout": 120,
41 "ephemeralStorage": 2,
42 "dagDirectoryLocation": "",
43 "rev": "",
44 "sshKey": "",
45 "knownHosts": ""
46 }
47}