This demo shows how one would use Airflow to copy data from one Postgres database to another. Docker containers are used to represent services in a production environment.
The docker-compose.yaml describes the service definitions. Most of it is adapted from the official Airflow docker-compose, with some minor tweaks, mainly the use of the LocalExecutor instead of the CeleryExecutor to reduce overhead since this is a toy application. For more information about the Airflow-provided docker-compose, see this page.
To set up the test environment, first run the startup script startup.sh if running from Linux. This creates three directories Airflow is going to use as volumes as specified in the docker-compose file and set the Airflow UID to the UID your user account is using.
you@host:$ docker-compose up airflow-inityou@host:$ docker-compose upyou@host:$ docker-compose down --volumes --remove-orphanspostgres: Airflow metadata serverairflow-webserver: Airflow webserver. Accessible atlocalhost:5884. Web UI login credentials:user: airflow; password: airflowairflow-scheduler: Airflow schedulerairflow-init: Performs some checks and initializes Airflowairflow-cli: Airflow CLI
postgres-source: Login credentials:user: postgres; password: postgrespostgres-target: Login credentials:user: postgres; password: postgrespg-admin: Web-based frontend for inspecting the databases. Accessible atlocalhost:5050. Login credentials:email: admin@admin.com; password: root
The demonstration can be found in dags/postgres_dag/postgres_dag.py. The DAG should show up automatically on the Airflow Web UI.
At startup, postgres-source gets populated with the sales table
| id | creation_date | sale_value |
|---|---|---|
| 1 | 2022-06-15 | 1234.56 |
| 2 | 2022-06-16 | 9876.54 |
You can view this in pg-admin by first linking to the Postgres database, then navigating to Servers/<registered server name>/Databases/postgres/Schemas/public/Tables/sales. To link the database to pg-admin
- Give the server a name in the
Generaltab (can be any name) - Go to the
Connectiontab and supply the following:
- Host name/address:
postgres-source(orpostgres-target) - Post:
5432 - Maintenance database:
postgres - Username:
postgres - Password:
postgres
The postgres-target database starts off empty, but after running the DAG, the sales table should be copied there.