Runnig Airflow on separate environments

written: Feb 1, 2021

This is a primer on how to setup Apache Airflow running on Centos 7 server for 2 separate environments (Prod and Simulation)

My company needed 2 separate envs, one for Production DAGs and one for Simulation/UAT DAGs.

Airflow can be a pain in the ass to setup, and the complexity grows exponentially if you are using Docker to do this. Reading different articles on how to setup separate environments using docker, I got frustrated because I kept running into container issues, Docker while a great tool can obfuscate basic troubleshooting and synchronization of all the different Airflow components.

I simply created 2 on-premise Airflow environments, sharing the same Postgres DB but separate physical databases. This setup does not use Docker at all and is relatively straight forward.

This shows how to spin up 2 separate Airflow envs using Saltstack config management tool.

This will install Airflow version 2.0.0 (latest as of Feb 2021)

All Saltstack code for this article is hosted here:

https://github.com/perfecto25/airflow

to install Airflow you will need to have the following installed:

  • Postgres 10 or up (I’m running Postgres 10)
  • Python 3.6 or higher

we will setup our 2 Airflow envs using the following folder structure

/opt/airflow/prod
/opt/airflow/sim

inside each folder you have separate .env files, DAGs, configs, etc

heres what the final layout of each env will look like

Airflow SIM

Both envs share the same Postgres DB connection, yet 2 different DB instances.

Create the folder structure (this can be done via Saltstack, but showing individual steps for clarity)

root@server> mkdir -p /opt/airflow/{sim,prod}

Install Airflow using VirtualEnv (always good practice to separate your python libs)

We can install one instance of Airflow application which will be shared by both environments, what we will be keeping separate is the database and configs

root@server> cd /opt/airflow
root@server> python3.6 -m virtualenv venv
root@server> source venv/bin/activate
(venv) root@server> pip install apache-airflow['postgres']

This will install all required libs into your venv located at /opt/airflow/venv

now create a symlink so the OS knows what the “airflow” command is,

ln -s /opt/airflow/venv/bin/airflow /usr/bin/airflow

create 2 databases, SIM and Prod (dont forget to add “;” to end of each psql command)

# create DB user
root@server> sudo -u postgres createuser airflow
root@server> su postgres
postgres@server> psql
# give Airflow user a password
postgres=# alter user airflow with encrypted password 'airflow';
# create Databases
postgres=# create database airflowsim;
postgres=# create database airflowprod;
postgres=# grant all privileges on database airflowsim to airflow;
postgres=# grant all privileges on database airflowprod to airflow;

exit PSQL by typing “\q”

Add an .env file to your Airflow env,

vim /opt/airflow/sim/.env## Airflow-{{ env }}
export AIRFLOW_CONFIG=/opt/airflow/sim/airflow.cfg
export AIRFLOW_HOME=/opt/airflow/sim
export AIRFLOW__WEBSERVER__NAVBAR_COLOR="#32a8a2"

This .env file provides the environment HOME variable as well as any additional parameters you want to customize like header color (I like to visually separate Prod and Sim by having Prod being red color, Sim being blue)

Now add the Airflow config file — this is a very large file and you can reference it via Github repo above, but most important variable to change is the Postgres connection string, Web server port, path to your DAGs (you want to make sure you dont combine the 2 environments and have them completely separated)

vim /opt/airflow/sim/airflow.cfg (see Github repo for full example)dags_folder = /opt/airflow/sim/dags
base_log_folder = /opt/airflow/sim/logs
sql_alchemy_conn = postgresql+psycopg2://airflow@localhost:5432/airflowsim
# SIM interface will run on 8095, Prod will run on 8090
base_url = http://<your server IP or hostname>:8095
web_server_port = 8095

Also add 2 keys, Fernet and Secrets key to airflow.cfg

to generate Fernet key run,

pip install cryptographypython -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

add the output to fernet_key variable

To generate secret_key, run

openssl rand -hex 30

add the output to secret_key variable

Now lets initialize the SIM database

# if you are still in virtualenv, exit it
(venv) root@server> exit
# source the Environment file that contains Airflow variables
root@server> source /opt/airflow/sim/.env
# activate venv and initialize the database
source /opt/airflow/venv/bin/activate
(venv) airflow db init

This will populate the SIM database with all required Airflow tables

create OS user called “airflow”

useradd airflow

add 3 service files, Webserver, Scheduler, Worker

Webserver

vim /usr/lib/systemd/system/airflowsim-webserver.service[Unit]
Description=Airflow-sim webserver daemon
After=network.target postgresql-10.service
Wants=postgresql-10.service
[Service]
EnvironmentFile=/opt/airflow/sim/.env
User=root
Group=root
Type=simple
ExecStart=/bin/bash -c 'source /opt/airflow/sim/.env;source /opt/airflow/venv/bin/activate;airflow webserver --pid /run/airflow/webserver-sim.pid'
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
[Service]
RuntimeDirectory=airflow
RuntimeDirectoryMode=0775

Scheduler

[Unit]
Description=Airflow-sim scheduler daemon
After=network.target postgresql-10.service
Wants=postgresql-10.service
[Service]
EnvironmentFile=/opt/airflow/sim/.env
User=root
Group=root
Type=simple
ExecStart=/bin/bash -c 'source /opt/airflow/sim/.env;source /opt/airflow/venv/bin/activate;airflow scheduler'
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target

Worker (if not using Celery, can skip this)

[Unit]
Description=Airflow-sim celery worker daemon
After=network.target postgresql-10.service
Wants=postgresql-10.service
[Service]
EnvironmentFile=/opt/airflow/sim/.env
User=root
Group=root
Type=simple
ExecStart=/bin/bash -c 'export C_FORCE_ROOT=True;source /opt/airflow/sim/.env;source /opt/airflow/venv/bin/activate;airflow worker'
Restart=on-failure
RestartSec=10s
[Install]
WantedBy=multi-user.target

Each of these service files will source each .env and virutalenv before running the service

Reload systemctl to pick up changes

systemctl daemon-reload

Now add a combined Services file, this will combine all 3 services into 1 easy to use script,

vi /opt/airflow/sim/service
#!/bin/bash
action=${1:-'status'}
function start(){
echo "starting airflow SIM webserver"
systemctl start airflowsim-webserver
echo "starting airflow SIM scheduler"
systemctl start airflowsim-scheduler
echo "starting airflow SIM worker"
systemctl start airflowsim-worker
}
function stop(){
echo "stopping airflow SIM webserver"
systemctl stop airflowsim-webserver
echo "stopping airflow SIM scheduler"
systemctl stop airflowsim-scheduler
echo "stopping airflow SIM worker"
systemctl stop airflowsim-worker
}
function status(){
systemctl status airflowsim-webserver
systemctl status airflowsim-scheduler
systemctl status airflowsim-worker
}
if [ $action == "start" ]
then
start
elif [ $action == "stop" ]
then
stop
elif [ $action == "restart" ]
then
stop
start
elif [ $action == "status" ]
then
status
else
echo "invalid command (start|stop|restart|status)"
fi

chmod +x /opt/airflow/sim/service

To start all Airflow SIM services run

/opt/airflow/sim/service start

to check for errors or startup messages, you can tail journal log

journalctl -f

Airflow SIM should startup and you can access the service via your browser

http://<your server>:8095 (or whatever port you configure for SIM)

SIM interface
PROD interface

to create user accounts, run airflow user create command to create Admin user (to create users with other Roles, see airflow documentation for Role types)

(venv) root@server> airflow users create -f Joe -l Smith -p abracadabra -r Admin -u jsmith -e jsmith@company.com

to backup your Postgres DB for SIM, just run a pg_dump command

root@server> mkdir /opt/airflow/sim/backupsroot@server> runuser -l postgres  -c 'pg_dump -O -F c -f /opt/airflow/sim/backups/backup.dat -Z 3 --blobs -p 5432 -h localhost -d airflowsim'

You now have a working SIM instance!

To create a Prod environment, follow the same steps but replace “sim” with “prod” for all files, commands and configs. Dont forget to change DB connection string to point to “airflowprod”, webserver Port and other variables.

If you use Salt, you can easily create both evironments by cloning the above Repo to your formula directory and run,

salt <target> state.sls formula.airflow.sim
salt <target> state.sls formula.airflow.prod

This formula will create everything mentioned in this article. Salt uses Jinja variables to separate environments and config variables.

Hope this helps your setup.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store