Search code examples
pythondatabricksazure-databricksmlflow

Saving and logging mlflow custom model


I am trying to use mlflow in Azure databricks for a custom ML model I have created. I am however new to mlflow so to get an idea of how to save and log the model I have created a small example from the mlflow documentation that I am trying to make work.

import pandas as pd
import mlflow

# Define the model class
class AddN(mlflow.pyfunc.PythonModel):

    def __init__(self, n):
        self.n = n

    def predict(self, context, model_input):
        return model_input.apply(lambda column: column + self.n)

model_input = pd.DataFrame([range(10)])
model_path = r"/FileStore/tmp/<folder path>"
add5_model = AddN(n=5)

with mlflow.start_run(run_name="test_forecast") as run:
  model_path = f"{model_path}-{run.info.run_uuid}"
  mlflow.log_param("algorithm", "AddN")
  mlflow.log_param("total_n_values", len(model_input))
  mlflow.pyfunc.save_model(path=model_path, python_model=add5_model)

This block of code will run and when I go to Experiments in Databricks I can see the parameters I have saved with the correct Run Name but when I press the Version I am taken to total white Azure Devops page - why is this?

Futher, when I go to the model_path folder I see no files. Currently I am setting the path to an empty folder in Azure Databricks DBFS - but I am unsure if this is the correct way and the reason to that I don't see any files?

I also want to log the model, as to why I have also added the following block of code:

reg_model_name = "ml_flow_AddN_test"

mlflow.pyfunc.log_model(artifact_path=model_path,
                        python_model = add5_model,
                        registered_model_name=reg_model_name)

However, when I further run this I get following error: RestException: INVALID_PARAMETER_VALUE: Invalid value '/FileStore/tmp/<folder-path>-111a11a111111a1a1111aa11111a1a11/requirements.txt' for parameter: 'path'. Path must be relative. (I have substituted the text after the <folder-path> with a similar example to what I get - I assume it is some sort of mlflow id?) - How do I solve this error?


Solution

  • The mlflow.pyfunc.log_model function's artifact_path parameter, is defined as :

    :param artifact_path: The run-relative artifact path to which to log the Python model.
    

    That means, it is just a name that should identify the model in the context of that run and hence cannot be an absolute path like what you passed in. Try something short like add5_model.

    reg_model_name = "ml_flow_AddN_test"
    
    mlflow.pyfunc.log_model(artifact_path= "add5_model",
                            python_model = add5_model,
                            registered_model_name=reg_model_name)