We are running Airflow 2.3.3 on Ubuntu 22.04. Airflow's webserver uses OAuth authentication (and authorization) against Azure AD. This works perfectly fine when invoking Airflow webserver from command line airflow webserver -D
(as user ubuntu
).
Now we want to build a systemd service to have the Airflow webserver run automatically upon boot-up of the server. This is our service config file /lib/systemd/system/airflow-webserver.service
:
[Unit]
Description=Airflow webserver daemon
After=network.target
Before=airflow-scheduler.service
[Service]
EnvironmentFile=/home/ubuntu/airflow/airflow.env
User=ubuntu
Group=ubuntu
Type=simple
ExecStart=/usr/bin/python /home/ubuntu/.local/bin/airflow webserver -D
Restart=on-failure
RestartSec=5s
PrivateTmp=false
StandardOutput=file:/home/ubuntu/airflow/logs/webserver/systemd-stdout.log
StandardError=file:/home/ubuntu/airflow/logs/webserver/systemd-errout.log
[Install]
WantedBy=multi-user.target
airflow.env
holds these variables:
export AIRFLOW_CONFIG=/home/ubuntu/airflow/airflow.cfg
export AIRFLOW_HOME=/home/ubuntu/airflow
When starting the service (sudo systemctl start airflow-webserver
) the webserver does come up and also shows Airflow's sign-up screen, but when it comes to authentication against Azure AD, we run into some timeout.
I can't figure out the difference between running the webserver from command line vs. running it as a system service. How do I make sure systemd will run the webserver using exactly the same configuration as I do when running it from command line?
Update (2022-10-04):
Output of sudo journalctl -f -u airflow-webserver
looks all fine - there are a few hints regarding a third-party plugin which we can igonore (because the same messages appear when running the webserver from command line):
ubuntu@xxx:~/airflow$ sudo journalctl -f -u airflow-webserver
Oct 04 10:07:48 ip-10-194-84-28 airflow[1508]: [2022-10-04 10:07:48,734] {init_appbuilder.py:515} INFO - Registering class RedocView on menu
Oct 04 10:07:48 ip-10-194-84-28 airflow[1508]: [2022-10-04 10:07:48,734] {init_appbuilder.py:515} INFO - Registering class RedocView on menu
Oct 04 10:07:48 ip-10-194-84-28 airflow[1508]: [2022-10-04 10:07:48,735] {baseviews.py:302} INFO - Registering route /redoc ('GET',)
Oct 04 10:07:48 ip-10-194-84-28 airflow[1508]: [2022-10-04 10:07:48,735] {baseviews.py:302} INFO - Registering route /redoc ('GET',)
Oct 04 10:07:48 ip-10-194-84-28 airflow[1508]: /home/ubuntu/.local/lib/python3.10/site-packages/airflow/plugins_manager.py:256 DeprecationWarning: This decorator is deprecated.
Oct 04 10:07:48 ip-10-194-84-28 airflow[1508]: In previous versions, all subclasses of BaseOperator must use apply_default decorator for the `default_args` feature to work properly.
Oct 04 10:07:48 ip-10-194-84-28 airflow[1508]: In current version, it is optional. The decorator is applied automatically using the metaclass.
Oct 04 10:07:48 ip-10-194-84-28 airflow[1508]: /home/ubuntu/.local/lib/python3.10/site-packages/airflow/providers_manager.py:614 DeprecationWarning: The provider airflow-provider-vaultspeed uses `hook-class-names` property in provider-info and has no `connection-types` one. The 'hook-class-names' property has been deprecated in favour of 'connection-types' in Airflow 2.2. Use **both** in case you want to have backwards compatibility with Airflow < 2.2
Oct 04 10:07:48 ip-10-194-84-28 airflow[1508]: [2022-10-04 10:07:48,780] {providers_manager.py:623} WARNING - The connection_type 'snowflake' has been already registered by provider 'airflow-provider-vaultspeed.'
Oct 04 10:07:48 ip-10-194-84-28 airflow[1508]: [2022-10-04 10:07:48,801] {providers_manager.py:623} WARNING - The connection_type 'snowflake' has been already registered by provider 'airflow-provider-vaultspeed.'
Analyzing processes in htop
I do see some difference (notice the gunicorn sub process holding parameters from airflow.cfg).
I figured out that even though the service is run as the user specified by User=
and Group=
, by default it lacks most environment variables from login shells (e.g. set by ~/.bashrc
or any script in /etc/profile.d/
).
We finally found this solution allowing to start the service using exactly the same environment variables as when run directly from an interactive shell:
ExecStart=/bin/bash -l -c 'exec "$@"' _ /some/folder/my-script.sh
bash -l
starts a login shell and executes the command provided after -c
.
I'm not really sure if the exec
part is really required or if the script path could simply go directly after -c
- but this solution works fine for us...