Skip to main content

Overview

Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring workflows. This template deploys a full Airflow stack using CeleryExecutor, with Redis as the task queue broker and PostgreSQL as the metadata database. Celery workers can optionally be autoscaled using KEDA based on queue depth.

What Gets Created

  • GVC — A dedicated GVC across the specified locations.
  • Airflow Webserver — The Airflow web UI for managing DAGs, monitoring task execution, and viewing logs.
  • Celery Workers — Distributed task execution workers that process DAG tasks.
  • Redis — A Redis broker for the Celery task queue, with persistent storage.
  • PostgreSQL — A PostgreSQL database for Airflow metadata storage.
  • Volume Sets — Persistent storage for Airflow DAG data, PostgreSQL, and Redis.
  • KEDA ScaledObject (optional) — Automatically scales Celery workers up or down based on Redis queue length.
  • Secret — A dictionary secret containing the PostgreSQL credentials, JWT signing key, Fernet encryption key, and admin password, shared across all Airflow workloads.
  • Identity & Policy — An identity bound to the workloads with reveal access to the Airflow configuration secret.

Pre-Deployment Checklist

Before deploying, generate and set the following required values in values.yaml:
ValueHow to generate
airflow.auth.jwtSecretopenssl rand -base64 48
airflow.auth.fernetKeypython3 -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())'
airflow.admin.passwordChoose a strong password
postgres.config.passwordChoose a strong password

Installation

This template has no external prerequisites. To install, follow the instructions for your preferred method:

Configuration

The default values.yaml for this template:
# Global Virtual Cloud (GVC) settings
gvc:
  name: airflow
  locations:
    - name: aws-eu-central-1

# Postgres database configuration
postgres:
  image: postgres:18
  resources:
    minCpu: 250m
    maxCpu: 500m
    minMemory: 512Mi
    maxMemory: 1024Mi
  config:
    username: username
    password: password
    database: airflow
  volumeset:
    capacity: 10 # initial capacity in GiB (minimum is 10)

# Redis cache configuration
redis:
  image: redis:7.4
  resources:
    cpu: 250m
    memory: 512Mi
  volumeset:
    capacity: 10 # initial capacity in GiB (minimum is 10)

# Apache Airflow configuration
airflow:
  webserver:
    image: apache/airflow:3.0.3
    resources:
      cpu: 2000m
      memory: 3Gi
  celeryWorker:
    image: controlplanecorporation/celery:v1
    resources:
      cpu: 256m
      memory: 512Mi
  webPort: 8080 # Port for accessing the Airflow web interface

  auth:
    jwtSecret: CHANGE_ME # REQUIRED: generate with "openssl rand -base64 48"
    jwtExpirationDelta: 3600 # JWT token expiration time in seconds
    jwtRefreshThreshold: 300 # Threshold before token expires to allow refresh (seconds)
    fernetKey: CHANGE_ME # REQUIRED: generate with "python3 -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())'"

  admin:
    username: admin
    password: CHANGE_ME # REQUIRED: change before deploying to production

  scheduler:
    dagDirListInterval: 10 # How often to check DAG folder (seconds)
    minFileProcessInterval: 10 # Minimum interval to process DAG files (seconds)

  celery:
    workerConcurrency: 1 # Number of tasks each worker can run concurrently

volumeset:
  airflow:
    capacity: 10 # initial capacity in GiB (minimum is 10)

# Firewall configuration
firewallConfig:
  inboundAllowCIDR:
    - 0.0.0.0/0 # Restrict to specific IPs in production (e.g. - 203.0.113.0/24)

# Git-sync configuration for DAG delivery
gitSync:
  enabled: false
  repo: "" # Git repository URL (e.g. https://github.com/org/dags)
  branch: main # Branch to sync
  period: 60s # How often to sync
  subPath: "" # Optional subfolder within the repo containing DAGs
  auth:
    token: "" # Personal access token for private repos (leave empty for public repos)

# KEDA (Kubernetes Event-driven Autoscaling) configuration
# NOT SUPPORTED in gcp/us-central1
keda:
  enabled: true # Enable or disable KEDA autoscaling
  minScale: 1 # Minimum number of Celery workers
  maxScale: 3 # Maximum number of Celery workers
  scaleToZeroDelay: 300 # Time before scaling to zero (seconds)
  listLength: 3 # Queue length threshold to trigger scaling
  cooldownPeriod: 1 # Cooldown between scaling events (seconds)
  initialCooldownPeriod: 1 # Cooldown after startup before scaling (seconds)
  pollingInterval: 4 # Interval at which KEDA queries metrics (seconds)

GVC

  • gvc.name — The name of the GVC. Must be unique per deployment.
  • gvc.locations — List of cloud locations to deploy to (e.g., aws-eu-central-1).

PostgreSQL

  • postgres.image — PostgreSQL Docker image.
  • postgres.resources — CPU and memory bounds for the PostgreSQL workload (minCpu, maxCpu, minMemory, maxMemory).
  • postgres.config.username / postgres.config.password — Database credentials. Change the default password before deploying to production.
  • postgres.config.database — Name of the Airflow metadata database (default: airflow).
  • postgres.volumeset.capacity — Persistent storage for PostgreSQL data (GiB, minimum 10).

Redis

  • redis.image — Redis Docker image.
  • redis.resources — CPU and memory allocated to Redis.
  • redis.volumeset.capacity — Persistent storage for Redis data (GiB, minimum 10).

Airflow Webserver and Workers

  • airflow.webserver.image / airflow.celeryWorker.image — Docker images for the webserver and Celery workers.
  • airflow.webserver.resources / airflow.celeryWorker.resources — CPU and memory per component.
  • airflow.webPort — Port the Airflow web UI listens on (default 8080).

Authentication

Airflow 3.x requires three security credentials, all of which must be changed before deploying to production:
  • airflow.auth.jwtSecret — Secret key used to sign JWT tokens for API authentication. Generate a secure value with:
    openssl rand -base64 48
    
  • airflow.auth.fernetKey — Key used to encrypt stored connections and variables. Generate with:
    python3 -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())'
    
  • airflow.auth.jwtExpirationDelta — Token lifetime in seconds (default 3600).
  • airflow.auth.jwtRefreshThreshold — Seconds before expiry at which a token refresh is allowed (default 300).

Admin Account

  • airflow.admin.username — Username for the initial Airflow admin account (default: admin).
  • airflow.admin.password — Password for the initial admin account. Change before deploying to production.
The admin user is created on first startup using Airflow’s SimpleAuthManager. Credentials are written to a password file on the shared volume and re-applied on every container restart, so the password always reflects the current value in values.yaml.
SimpleAuthManager is the default auth manager in Airflow 3.x and is suitable for development and internal deployments. For production deployments requiring SSO or LDAP, consider integrating an external auth provider via OAuth/OIDC.

Scheduler

  • airflow.scheduler.dagDirListInterval — How often the scheduler scans the DAG folder for new or modified files (seconds).
  • airflow.scheduler.minFileProcessInterval — Minimum interval between processing the same DAG file (seconds).

Celery

  • airflow.celery.workerConcurrency — Number of tasks a single Celery worker can execute concurrently.

Storage

  • volumeset.airflow.capacity — Persistent storage for the Airflow home directory shared across workloads (GiB, minimum 10).
PostgreSQL and Redis storage are configured under their respective sections (postgres.volumeset.capacity and redis.volumeset.capacity).
The Airflow volume uses a shared (NFS-style) filesystem, allowing both the webserver and Celery workers to read DAGs and write logs to the same volume.

Firewall

  • firewallConfig.inboundAllowCIDR — List of CIDR ranges allowed to reach the Airflow webserver. Defaults to 0.0.0.0/0 (public). Restrict to specific IP ranges in production.

Git-Sync

Git-sync runs as a sidecar container on the webserver and Celery worker workloads, continuously pulling DAGs from a Git repository into the shared Airflow volume. This is the recommended approach for managing DAGs in production.
PropertyDescription
gitSync.enabledEnable or disable the git-sync sidecar
gitSync.repoGit repository URL (e.g. https://github.com/org/dags)
gitSync.branchBranch to sync (default: main)
gitSync.periodSync interval (default: 60s)
gitSync.subPathOptional subfolder within the repo containing DAG files
gitSync.auth.tokenPersonal access token for private repositories (leave empty for public repos)
When git-sync is disabled, DAGs can be placed manually in the /opt/airflow/dags directory on the Airflow volume.

KEDA Autoscaling

KEDA scales Celery workers automatically based on the Redis queue length.
KEDA is not supported in gcp/us-central1.
PropertyDescription
keda.enabledEnable or disable KEDA autoscaling
keda.minScaleMinimum number of Celery workers
keda.maxScaleMaximum number of Celery workers
keda.scaleToZeroDelaySeconds of inactivity before scaling to zero
keda.listLengthRedis queue length that triggers a scale-up
keda.cooldownPeriodSeconds to wait between scaling events
keda.initialCooldownPeriodSeconds after startup before autoscaling activates
keda.pollingIntervalInterval at which KEDA queries Redis for metrics (seconds)

Connecting to Airflow

Once deployed, the Airflow web UI is available at the workload’s canonical endpoint:
https://<gvc-name>-airflow-webserver.<gvc-name>.cpln.app
Log in with the airflow.admin.username and airflow.admin.password set in values.yaml.
This template creates a GVC with a default name defined in the values file. If you plan to deploy multiple instances, you must assign a unique GVC name for each deployment.

API Access

Airflow 3.x uses JWT-based authentication for API access. To obtain a token:
curl -X POST https://<your-airflow-url>/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "your-password"}'
Use the returned token for subsequent API requests:
curl https://<your-airflow-url>/api/v2/dags \
  -H "Authorization: Bearer <token>"

Production Considerations

  • Change all CHANGE_ME values before deploying — jwtSecret, fernetKey, admin.password, and postgres.config.password are all required.
  • Restrict firewallConfig.inboundAllowCIDR to trusted IP ranges to limit access to the Airflow UI.
  • Enable git-sync for reliable, version-controlled DAG delivery.
  • Auth: SimpleAuthManager is not recommended for deployments requiring enterprise SSO. Evaluate an OAuth/OIDC integration for those use cases.

External References