I have a deployment of Airflow running in a kubernetes cluster. I deployed it there using the official helm chart (described here). I manage DAGs using the recommended way, by "baking" them into a docker image (described here).
When I create or change a DAG, I run the docker build
and docker push
commands as listed in the documentation. This all works great. However, my changes don't show up in the GUI until I delete the scheduler and the webserver pods (kubectl delete pod airflow-scheduler-xxx
, etc), forcing kubernetes to spin up new ones. This is not ideal for CI/CD purposes, as I don't want to continuously have to do this manually. Is there a way to pick up the changes automatically (e.g. by pulling the image periodically)?
I already tried to set dag_dir_list_interval
in airflow.cfg
(suggested here) through the helm values, but this doesn't seem to change anything. PullPolicy
is set to Always
because I use the latest
tag. It looks like this in my override.yaml
:
images:
airflow:
repository: -registry-name-
tag: latest
pullPolicy: Always
config:
scheduler:
# after how much time a new DAGs should be picked up from the filesystem
min_file_process_interval: 0
dag_dir_list_interval: 60
I was eventually able to figure this out myself. The reason the pods weren't updating (i.e. weren't pulling the new image with the updated DAGs) is because I was using the latest
tag. After building and pushing the new image, the cluster didn't know to pull that new image, because the tag was still the same. The solution is to not use the latest
tag, but to use a tag that changes every time CI/CD runs. I chose to use $(Build.SourceVersion)
in Azure Pipelines, which corresponds to the git hash of the commit that caused the pipeline to run. Then I added a helm upgrade
command with a flag pointing to the new image tag. This command causes the relevant pods to be updated automatically.
helm upgrade --install airflow apache-airflow/airflow --set images.airflow.tag="$(Build.SourceVersion)"