Search code examples
mlflowmlops

`mlflow server` - Difference between `--default-artifact-root` and `--artifacts-destination`


I am using mlflow server to set up mlflow tracking server. mlflow server has 2 command options that accept artifact URI, --default-artifact-root <URI> and --artifacts-destination <URI>.

From my understanding, --artifacts-destination is used when the tracking server is serving the artifacts.

Based on the scenarios 4 and 5 provided by MLflow Tracking documentation

mlflow server --backend-store-uri postgresql://user:password@postgres:5432/mlflowdb --default-artifact-root s3://bucket_name --host remote_host --no-serve-artifacts
mlflow server \
  --backend-store-uri postgresql://user:password@postgres:5432/mlflowdb \
  # Artifact access is enabled through the proxy URI 'mlflow-artifacts:/',
  # giving users access to this location without having to manage credentials
  # or permissions.
  --artifacts-destination s3://bucket_name \
  --host remote_host

In the 2 scenarios, both --default-artifact-root and --artifacts-destination accept a s3 bucket URI, s3://bucket_name as the argument. I fail to see why we need 2 separate command options for setting artifact URI.

Their descriptions are

--default-artifact-root <URI>
Directory in which to store artifacts for any new experiments created. For tracking server backends that rely on SQL, this option is required in order to store artifacts. Note that this flag does not impact already-created experiments with any previous configuration of an MLflow server instance. By default, data will be logged to the mlflow-artifacts:/ uri proxy if the –serve-artifacts option is enabled. Otherwise, the default location will be ./mlruns.

--artifacts-destination <URI>
The base artifact location from which to resolve artifact upload/download/list requests (e.g. ‘s3://my-bucket’). Defaults to a local ‘./mlartifacts’ directory. This option only applies when the tracking server is configured to stream artifacts and the experiment’s artifact root location is http or mlflow-artifacts URI.

What is the reason of having the 2 command options? What happen if both are specified, will one URI precede the other?


Solution

  • At first, it looks confusing because you have high flexibility.
    You can use both of them or only one of them. Let's explain it a bit more :-)

    • The --default-artifact-root is a directory for storing artifacts for every new experiment.
      • NOTE: The default value depend if the -serve-artifacts is enabled or not (mlflow-artifacts:/, ./mlruns)
    • --artifacts-destination is used to specify the location of artifacts in HTTP requests.
      • NOTE: This option only applies when the tracking server is configured to stream artifacts (--serve-artifacts is enabled) AND the experiment’s artifact root location is http or mlflow-artifacts URI

    Case 1: Use both --default-artifact-root & --artifacts-destination:

    mlflow server
      --default-artifact-root mlflow-artifacts:/
      --artifacts-destination s3://my-root-bucket
      --host remote_host
      --serve-artifacts
    

    Case 2: Use only --artifacts-destination

    mlflow server
      --artifacts-destination s3://my-root-bucket
      --host remote_host
      --serve-artifacts
    

    Case 3: Use only --default-artifact-root

    mlflow server
        --default-artifact-root is s3://my-root-bucket/mlartifacts
        --serve-artifacts
    

    In this case the server can resolve all the following patterns to the configured proxied object store location of s3://my-root-bucket/mlartifacts:

    https://<host>:<port>/mlartifacts
    http://<host>/mlartifacts
    mlflow-artifacts://<host>/mlartifacts
    mlflow-artifacts://<host>:<port>/mlartifacts
    mlflow-artifacts:/mlartifacts