Search code examples
pythonsystemdubuntu-20.04mlflow

Running mlflow as a systemd service - gunicorn not found


I am trying to run a mlflow tracking server that is installed inside of a virtualenv as a systemd service on Ubuntu 20.04 but I am getting an error indicating that it is unable to find gunicorn. Here is my journal

nov 27 10:37:17 Atrium-Power mlflow[81375]: Traceback (most recent call last):
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/bin/mlflow", line 8, in <module>
nov 27 10:37:17 Atrium-Power mlflow[81375]:     sys.exit(cli())
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/lib/python3.9/site-packages/click/core.py", line 829, in __call__
nov 27 10:37:17 Atrium-Power mlflow[81375]:     return self.main(*args, **kwargs)
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/lib/python3.9/site-packages/click/core.py", line 782, in main
nov 27 10:37:17 Atrium-Power mlflow[81375]:     rv = self.invoke(ctx)
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
nov 27 10:37:17 Atrium-Power mlflow[81375]:     return _process_result(sub_ctx.command.invoke(sub_ctx))
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
nov 27 10:37:17 Atrium-Power mlflow[81375]:     return ctx.invoke(self.callback, **ctx.params)
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/lib/python3.9/site-packages/click/core.py", line 610, in invoke
nov 27 10:37:17 Atrium-Power mlflow[81375]:     return callback(*args, **kwargs)
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/lib/python3.9/site-packages/mlflow/cli.py", line 392, in server
nov 27 10:37:17 Atrium-Power mlflow[81375]:     _run_server(
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/lib/python3.9/site-packages/mlflow/server/__init__.py", line 138, in _run_server
nov 27 10:37:17 Atrium-Power mlflow[81375]:     exec_cmd(full_command, env=env_map, stream_output=True)
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/lib/python3.9/site-packages/mlflow/utils/process.py", line 34, in exec_cmd
nov 27 10:37:17 Atrium-Power mlflow[81375]:     child = subprocess.Popen(
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/lib/python3.9/subprocess.py", line 947, in __init__
nov 27 10:37:17 Atrium-Power mlflow[81375]:     self._execute_child(args, executable, preexec_fn, close_fds,
nov 27 10:37:17 Atrium-Power mlflow[81375]:   File "/home/praxasense/.miniconda3/envs/mlflow-server/lib/python3.9/subprocess.py", line 1819, in _execute_child
nov 27 10:37:17 Atrium-Power mlflow[81375]:     raise child_exception_type(errno_num, err_msg, err_filename)
nov 27 10:37:17 Atrium-Power mlflow[81375]: FileNotFoundError: [Errno 2] No such file or directory: 'gunicorn'

and my systemd is this:

[Unit]
StartLimitBurst=5
StartLimitIntervalSec=33

[Service]
User=praxasense
WorkingDirectory=/home/praxasense
Restart=always
RestartSec=5
ExecStart=/home/praxasense/.miniconda3/envs/mlflow-server/bin/mlflow server --port 3569 --backend-store-uri .mlruns

[Install]
WantedBy=multi-user.target

The strange thing is that if I run the command from ExecStart in my terminal it works fine in fish shell, but not in bash, but if I do conda activate mlflow-server and then do mlflow ... it does work. As far as I understood the Python interpreter should be aware of it's virtual environment and so it should work as I tried it, but apparently I am missing something that makes it not able to find the gunicon package, which is obviously there.

Any ideas?


Solution

  • Try adding the venv's bin path to the environment that systemd runs in:

    [Service]
    ...
    Environment="PATH=/home/praxasense/.miniconda3/envs/mlflow-server/bin"
    ...
    

    I also recommend setting KillMode=mixed, since MLFlow will spawn gunicorn instances that won't be terminated if you terminate the service otherwise. mixed means that child processes will also be terminated.