I have an existing model that was trained on Azure. I want to fully integrate and start using the model on Databricks. Whats the best way to do this? How can I successfully load the model into databricks model workflow? I have the model in a pickle file
I have read almost all the documentation on databricks, but 99% of it is regarding new models trained on databricks and never about importing existing models.
Since MLFlow has a standardized model storage format, you just need to bring over the model files and start using them with the MLFlow package. In addition, you can register the model to the workspace's model registry using mlflow.register_model()
and then use it from there. These would be the steps:
mlflow.sklearn.save_model()
or mlflow.sklearn.autolog
-- or some other mlflow.<flavor>
). That should give you a folder that contains an MLModel
file, and, depending on the flavor of the model a few more files -- like the below:mlflow-model
├── MLmodel
├── conda.yaml
├── model.pkl
└── requirements.txt
Note: You can download the model from the AzureML Workspace using the v2 CLI like so:
az ml model download --name <model_name> --version <model_version>
mlflow
installed%pip install mlflow
Upload the MLFlow model files to the dbfs
connected to the cluster
In the Notebook, register the model using MLFlow (adjust the dbfs:
path to the location where the model was uploaded to).
import mlflow
model_version = mlflow.register_model("dbfs:/FileStore/shared_uploads/mlflow-model/", "AzureMLModel")
Now your model is registered in the Workspace's model registry like any model that was created from a Databricks session. So, you can access it from the registry like so:
model = mlflow.pyfunc.load_model(f"models:/AzureMLModel/{model_version.version}")
input_example = {
"sepal_length": [5.1,4.8],
"sepal_width": [3.5,4.4],
"petal_length": [1.4,2.0],
"petal_width": [0.2,0.1]
}
model.predict(input_example)
Or use the model as a spark_udf
:
import pandas as pd
model_udf = mlflow.pyfunc.spark_udf(spark=spark, model_uri=f"models:/AzureMLModel/{model_version.version}", result_type='string' )
spark_df = spark.createDataFrame(pd.DataFrame(input_example))
spark_df = spark_df.withColumn('foo', model_udf())
display(spark_df)
Note that I am using
mlflow.pyfunc
to load the model since every MLFlow model needs to support thepyfunc
flavor. That way, you don't need to worry about the native flavor of the model.