MLFlow Registry high availability

I am running the mlflow registry using mlflow server (https://mlflow.org/docs/latest/model-registry.html). The server runs fine. If the server crashes for any reason it restart automatically. But for the time of restart the server is not available.

Is it possible to run multiple isntances in parallel behind a load balancer? Is this safe or could it be possible that there are any inconsistencies?

Solution

Yes, it's possible to have multiple instances of MLflow Tracker Service running behind a load balancer.

Because the Tracking server is stateless, you could have multiple instances log to a replicated primary DB as a store. A second hot standby can take over if the primary fails.

As for the documentation in how to set up replicated instances of your backend store will vary on which one you elect to use, we cannot definitely document all different scenarios and their configurations.

I would check the respective documentation of your backend DB and load balancer for how to federate requests to multiple instances of an MLflow tracking server, how to failover to a hot standby or replicated DB, or how to configure a hot-standby replicated DB instance.

The short of it: MLflow tracking server is stateless.