Remote Execution Agents
Overview
Remote Execution relies on Agents in your environment, or the execution plane, to communicate with the API server in the Astro orchestration plane. The Agent heartbeats with its capabilities and the queue it is listening on, and the API server will respond back to assign work accordingly. Worker Agents run synchronous tasks, Triggerer Agents run asynchronous tasks/deferrable operators and the DAG Processor Agent processes dags and sends up serialized representations.
You can register multiple remote queues for your Remote Execution Agents for similar reasons you would register multiple worker queues in Hosted execution mode.
Add https://<clusterId>.external.astronomer.run/
to your organization's network allowlist so the Remote Execution Agents in your environment are able to heartbeat to the API server in the Astro orchestration plane.
Register Remote Execution Agent with a Deployment
Prerequisites
- Kubernetes 1.30+
- Helm 3+
- A valid Astro account with permissions to create an Astro Agent
- An Agent Token from your Astro Deployment
- An API Deployment Token with Deployment Admin scoped permissions to pull the base Astro Remote Execution Agent Image
Step 1: Create Agent Token
- In the Astro UI, select a Workspace, click Deployments, and then select the Remote Execution Deployment that you want to register the Remote Execution Agent with.
- Select the Remote Agents tab.
- Use the toggle to switch to the Tokens view.
- Click +Agent Token.
- Define a Name, Expiration and optionally a Description for the Agent Token.
- Copy the Agent Token. You will not be shown the Agent Token value again.
Step 2: Install the Helm chart
- Use the toggle to switch back to the Agents view, and click Register a Remote Agent
- Click the Download button on the modal to download the
values.yaml
file. Use this Helm chart to configure the Remote Execution Agent. - Most config options in the
values.yaml
do not need to be updated to register and activate the Remote Agent. The following values in the Remote Agent's Helm chart need to be updated:resourceNamePrefix
,namespace
,secretBackend
,xcomBackend
,imagePullSecretName
, andagentToken
oragentTokenSecretName
. The descriptions of each value is provided in the Helm chart itself. - Install the Helm chart. The following code also adds the Helm repo and applies any updates:
helm repo add https://helm.astronomer.io
helm repo update
helm install astro-agent astronomer/astro-remote-execution-agent -f values.yaml
Step 3: Set allowed IP address ranges
Setting allowed IP address ranges for a Remote Execution Deployment is not necessary but allows you to limit the Deployment's incoming traffic to the Remote Agents in your environment.
- In the Set Allowed IP Address Ranges step on the modal, click edit this Deployment.
- Click +Add IP.
- Add the IP address range, then click Add.
- Repeat to allowlist more IP address ranges.
Step 4: Check for Agent heartbeat
Close the modal to view the Agent list page and check for a healthy agent.
You should see at least one agent with a health status of "Healthy" and a value for "last heartbeat" no more than one minute prior. If your Remote Agent is healthy, configure dagBundleConfigList
in values.yaml
and do a Helm upgrade. You can now run Dags on this Agent.
Remote Executon Agents Helm chart configuration reference
The following Helm chart config values are required to run dags with your Remote Execution Agent. To see all available configuration options with descriptions, see the value.yaml file downloaded in step 2.
agentToken or agentTokenSecretName
You must specify either agentToken
or agentTokenSecretName
to inject the Agent token generated in Step 1 into the Helm chart.
-
For
agentToken
, pass the Agent token value copied in Step 1 into the Helm chart as a string, and Helm will create a secret with the nameagent-token-secret
in the namespace. -
For
agentTokenSecretName
, pass the name of the secret containing the Agent token to connect the Agent to the Astro Deployment. This field should be used when the secret has already been created in the namespace through your own external mechanism. The secret must contain a key namedtoken
with the value set to the Agent Token created in Step 1.
imagePullSecretName or imagePullSecretData
You must specify either imagePullSecretName
or imagePullSecretData
to allow the Remote Execution Agent to pull container images from the registry. This can use your Astro Deployment API Token from your pre-reqs.
- For
imagePullSecretName
, provide the name of an existing Kubernetes secret in the namespace. This secret must contain a key named.dockerconfigjson
with your image pull credentials. Use this option if the secret has already been created. Example Kubernetes command to create the secret:
kubectl create secret docker-registry -n <namespace> <secretName> \
--docker-server=images.astronomer.cloud \
--docker-username=cli \
--docker-password=<astroToken>
- For
imagePullSecretData
, provide the Docker config JSON as a string. The Helm chart will use this value to create a secret namedimage-pull-secret
in the namespace. The value must follow the standard Docker config format:
{
"auths": {
"<registry.example.com>": {
"auth": "<auth-token>",
"email": "<email-address>"
}
}
}
namespace
Specifies the Kubernetes namespace where the Astro Remote Agent will be deployed.
If createNamespace
is set to true
, the Helm chart will create the namespace with the name provided in this field.
If createNamespace
is set to false
, the namespace must be created manually before deploying the chart, and this field should reference the existing namespace.
See Install Remote Execution Agents in a restricted kubernetes namespace for steps to configure an agent in a kubernetes namespace.
If you are passing agentTokenSecretName
and imagePullSecretName
, createNamespace
must be set to false
and the namespace must be created manually with those secrets already present.
resourceNamePrefix
Specifies a name prefix for all the Kubernetes resources related to Remote Execution Agent components that would be deployed by this chart such as Deployments, ConfigMaps and Secrets.
secretBackend
The Airflow secret backend class to use for the Agent. See Secrets backend integration for Remote Execution for external secrets providers configuration instructions.
It is not recommended for production use cases, but you can also use Airflow’s local filesystem backend as a simpler alternative to an external secrets provider by setting:
secretBackend: "airflow.secrets.local_filesystem.LocalFilesystemBackend"
The rest of the configuration should be passed as environment variables to the Agent components.
xcomBackend
The Airflow XCom backend class to use for the Agent.
See Configure XCOM backend for a Remote Execution Agent for how to set up the XCom Backend in the Remote Execution Agent components.
Failures
When the heartbeat between the API server and a Remote Execution Agent is disrupted, the Astro executor prevents task duplication by marking in-flight tasks from failed Agents as failed
. This makes them available for reassignment to healthy Agents. An Agent must receive confirmation from the API server before starting a task, ensuring only actively monitored tasks proceed. If an Agent loses connection to the API server, it will continue executing already assigned tasks but will not start new ones until a successful heartbeat is re-established.
Agent failure
The API Server will consider an Agent to have failed when its last heartbeat exceeds 3x the expected heartbeat interval. Once an Agent is considered to have failed, the API Server will determine if it had any in-flight work (tasks that were assigned to that Agent but were not reported back as completed) and mark those tasks as failed. By doing so, it will make those tasks available such that the next Agent that heartbeats will be assigned them.
API Server failure
If an Agent is unable to reach the API Server for its heartbeats, it will assume that other Agents are still able to contact the API Server. As such, the Agent will proceed with any already in-flight tasks, since it can be sure that they will not be reassigned to other Agents for execution. However, the Agent will not start any net new tasks because since the API Server has not had a chance yet to mark the task as "running", the task could be reassigned to another Agent.
Manage Remote Execution Agents
On the Remote Agents tab of the Deployment, click More options for a Remote Execution Agent on the list to access an action menu for that Agent.
You can take the following actions on your registered Remote Execution Agents:
- Cordon: Cordoning a Remote Execution Agent marks it as unavailable for scheduling new tasks, while allowing it to continue running and complete any tasks already in progress.
This allows you to gracefully remove the Agent from service without interrupting current workloads. For example, you can cordon an Agent to delete or perform maintenance, such as an upgrade, on the Agent or underlying infrastructure.
A cordoned Agent will not receive new work, but it remains active until all running tasks have finished. Once ready to reintroduce the Agent to the task pool, it can be uncordoned to resume normal operation.
-
Uncordon: Uncordoning a Remote Execution Agent re-enables it to receive new tasks and resume normal scheduling.
-
Delete: Deletes the Remote Execution Agent from the Deployment.
Remote Execution Agent maintenance policy
Each Remote Execution Agent minor version is maintained for 6 months from the release month.
See Maintenance policy for more details about versioning, support, and upgrade recommendations.