I am trying to use mlflow in Azure databricks for a custom ML model I have created. I am however new to mlflow so to get an idea of how to save and log the model I have created a small example from the mlflow documentation that I am trying to make work.
import pandas as pd
import mlflow
# Define the model class
class AddN(mlflow.pyfunc.PythonModel):
def __init__(self, n):
self.n = n
def predict(self, context, model_input):
return model_input.apply(lambda column: column + self.n)
model_input = pd.DataFrame([range(10)])
model_path = r"/FileStore/tmp/<folder path>"
add5_model = AddN(n=5)
with mlflow.start_run(run_name="test_forecast") as run:
model_path = f"{model_path}-{run.info.run_uuid}"
mlflow.log_param("algorithm", "AddN")
mlflow.log_param("total_n_values", len(model_input))
mlflow.pyfunc.save_model(path=model_path, python_model=add5_model)
This block of code will run and when I go to Experiments in Databricks I can see the parameters I have saved with the correct Run Name but when I press the Version I am taken to total white Azure Devops page - why is this?
Futher, when I go to the model_path
folder I see no files. Currently I am setting the path to an empty folder in Azure Databricks DBFS - but I am unsure if this is the correct way and the reason to that I don't see any files?
I also want to log the model, as to why I have also added the following block of code:
reg_model_name = "ml_flow_AddN_test"
mlflow.pyfunc.log_model(artifact_path=model_path,
python_model = add5_model,
registered_model_name=reg_model_name)
However, when I further run this I get following error:
RestException: INVALID_PARAMETER_VALUE: Invalid value '/FileStore/tmp/<folder-path>-111a11a111111a1a1111aa11111a1a11/requirements.txt' for parameter: 'path'. Path must be relative.
(I have substituted the text after the <folder-path>
with a similar example to what I get - I assume it is some sort of mlflow id?) - How do I solve this error?
The mlflow.pyfunc.log_model
function's artifact_path
parameter, is defined as :
:param artifact_path: The run-relative artifact path to which to log the Python model.
That means, it is just a name that should identify the model in the context of that run and hence cannot be an absolute path like what you passed in. Try something short like add5_model
.
reg_model_name = "ml_flow_AddN_test"
mlflow.pyfunc.log_model(artifact_path= "add5_model",
python_model = add5_model,
registered_model_name=reg_model_name)