Search code examples
pythonscikit-learnmlflowmlops

How can I save more metadata on an MLFlow model


I am trying to save a model to MLFlow, but as I have a custom prediction pipeline to retrieve data, I need to save extra metadata into the model.

I tried using my custom signature class, which It does the job correctly and saves the model with the extra metadata in the MLModel file (YAML format). But when want to load the model from the MLFlow registry, the signature is not easy accesible.

mlflow.sklearn.log_model(model, "model", signature = signature)

I've also tried to save an extra dictionary at the log_model function, but it saves it in the conda.yaml file:

mlflow.sklearn.log_model(model, "model", {"metadata1":"value1", "metadata2":"value2"})

Should I make my own flavour? Or my own Model inheritance? I've seen here that the PyFuncModel recieves some metadata class and an implementation to solve this, but I don't know where should I pass my own implementations to PyFuncModel on an experiment script. Here's a minimal example:

import mlflow
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression

metadata_dic = {"metadata1": "value1", 
                "metadata2": "value2"}

X = np.array([[-2, -1, 0, 1, 2, 1],[-2, -1, 0, 1, 2, 1]]).T
y = np.array([0, 0, 1, 1, 1, 0])

X = pd.DataFrame(X, columns=["X1", "X2"])
y = pd.DataFrame(y, columns=["y"])


model = LogisticRegression()
model.fit(X, y)

mlflow.sklearn.log_model(model, "model")

Solution

  • Finally, I made a class that contains every metadata and saved it as an model argument:

    model = LogisticRegression()
    model.fit(X, y)
    model.metadata = ModelMetadata(**metadata_dic)
    mlflow.sklearn.log_model(model, "model")
    

    Here I lost the customizable predict process, but after reading the MLFlow documentation is not very clear how to proceed.

    If anyone finds a good approach It would be very appreciated.