python scikit-learn pickle pipeline joblib

TransformedTargetRegressor save and load error

I'm defining my custom regressor using the TransformedTargetRegressor, adding it to the pipeline and saving the model in the 'joblib' file. However as I'm trying to load the model, I get an error

module 'main' has no attribute 'transform_targets'

where transform_targets is one of the functions defined for the regressor

def transform_targets(targets):
   targets = (targets - min_t)/(max_t-min_t)
   return targets

def inv_transform_targets(outputs):
   outputs = (outputs)*(max_t-min_t)+min_t
   return outputs

# Define the model 

mlp_model = MLPRegressor(activation = 'relu', validation_fraction = 0.2, hidden_layer_sizes=(1000, ))
full_model = TransformedTargetRegressor(regressor = mlp_model, func = transform_targets,
                                 inverse_func = inv_transform_targets)

# Incorporate feature scaling via pipeline

pipeline = make_pipeline(MinMaxScaler(), full_model)
nn_model = pipeline.fit(X_train,y_train)

# Fit the model which uses the transformed target regressor + maxmin pipeline

nn_model.fit(X_train,y_train)

from joblib import dump, load
dump(nn_model, 'fitness_nn_C1.joblib')

The model works fine and predicts well, it saves with no errors, but would not load back. If I save it with pickle it also returns a similar error

AttributeError: Can't get attribute 'transform_targets' on module 'main'>

Does anyone know how to save a model which includes a TransformedTargetRegressor in one file, that can then be reloaded successfully? I realise that I can dump the parameters/ functions associated with transforming the targets in a separate file, but that's exactly what I want to avoid

Edit:

The current workaround is to use MinMaxScaler as a transformer, or any other transformer from the preprocessing lot, but still don't know if it's possible to include the custom functions in this workflow

Solution

The problem is when you try to load the file back it cannot resolve transform_targets which was not dumped initially. You can make use of dill to serialize it. So basically you have to create a list of items you want to dump and then use dill and joblib to serialize them as shown below:

from sklearn.neural_network import MLPRegressor
from sklearn.compose import TransformedTargetRegressor
from sklearn.pipeline import make_pipeline
from sklearn.datasets import make_friedman1
from sklearn.preprocessing import MinMaxScaler
import dill
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

min_t = 10
max_t = 300
def transform_targets(targets):
   targets = (targets - min_t)/(max_t-min_t)
   return targets

def inv_transform_targets(outputs):
   outputs = (outputs)*(max_t-min_t)+min_t
   return outputs

# Define the model 

mlp_model = MLPRegressor(activation = 'relu', validation_fraction = 0.2, hidden_layer_sizes=(1000, ))
full_model = TransformedTargetRegressor(regressor = mlp_model, func = transform_targets,
                                 inverse_func = inv_transform_targets)

# Incorporate feature scaling via pipeline

pipeline = make_pipeline(MinMaxScaler(), full_model)
nn_model = pipeline.fit(X,y)

# Fit the model which uses the transformed target regressor + maxmin pipeline

nn_model.fit(X,y)
to_save = [transform_targets, inv_transform_targets, nn_model]
r = dill.dumps(to_save)
from joblib import dump, load
dump(r, 'fitness_nn_C1.joblib')

And now you can load it as shown below:

from joblib import dump, load
import dill
Q = load('fitness_nn_C1.joblib')
T = dill.loads(Q)

T will look like this:

[<function __main__.transform_targets(targets)>,
 <function __main__.inv_transform_targets(outputs)>,
 Pipeline(memory=None,
          steps=[('minmaxscaler', MinMaxScaler(copy=True, feature_range=(0, 1))),
                 ('transformedtargetregressor',
                  TransformedTargetRegressor(check_inverse=True,
                                             func=<function transform_targets at 0x000001F486D27048>,
                                             inverse_func=<function inv_transform_targets at 0x000001F4882E6C80>,
                                             regressor=MLPRegressor(activation='relu',
                                                                    alpha=0.0001,
                                                                    batch_size='a...
                                                                    beta_2=0.999,
                                                                    early_stopping=False,
                                                                    epsilon=1e-08,
                                                                    hidden_layer_sizes=(1000,),
                                                                    learning_rate='constant',
                                                                    learning_rate_init=0.001,
                                                                    max_iter=200,
                                                                    momentum=0.9,
                                                                    n_iter_no_change=10,
                                                                    nesterovs_momentum=True,
                                                                    power_t=0.5,
                                                                    random_state=None,
                                                                    shuffle=True,
                                                                    solver='adam',
                                                                    tol=0.0001,
                                                                    validation_fraction=0.2,
                                                                    verbose=False,
                                                                    warm_start=False),
                                             transformer=None))],
          verbose=False)]