How can I use a keras callback in a sklearn pipeline?

I am trying to create a simple multy-layer perceptron (MLP) using Keras. In order to avoid data leakage I am using a pipeline in a cross-validation routine.

To do that I have to use a keras wrapper; everything is working fine unless I do not put a TensorBoard callback into the wrapper. I read tons of stackoverflow answers and it looks that my code is correct but I get the following error:

> RuntimeError: Cannot clone object <tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object at 0x00000245DD5C2A60>, as the constructor either does not set or modifies parameter callbacks

Below my code:

#Network and training parameters
EPOCHS = 100
INPUT_SHAPE = (Xtr.shape[1],)
OUTPUT_SHAPE = 1 #number of outputs

def build_mlp(n_hidden, input_shape, output_shape):
    #Build the model
    model = tf.keras.models.Sequential()
    model.add(keras.layers.Dense(units = n_hidden,
                                 input_shape = input_shape,
                                 name = 'dense_layer_1',
                                 activation = 'relu'))
    model.add(keras.layers.Dense(units = output_shape,
                                 name ='output_layer',
                                 activation = 'sigmoid'))
    return model

import datetime
LOG_DIR = "logs/MLP_anomaly/" +"%Y%m%d-%H%M%S")
CALLBACKS = [tf.keras.callbacks.TensorBoard(log_dir = LOG_DIR)]

#create a wrapper to use sklearn pipelines
sk_model = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=build_mlp,
                                                          callbacks = CALLBACKS,
                                                          n_hidden = N_HIDDEN,
                                                          input_shape = INPUT_SHAPE,
                                                          output_shape = OUTPUT_SHAPE)

#use a pipeline
pipe = Pipeline([('scaler', MinMaxScaler()), ('mlp', sk_model)])

n_splits, n_repeats = 3, 1
cv = RepeatedStratifiedKFold(n_splits=n_splits, n_repeats=n_repeats, random_state=seed)
cv_rslt = cross_validate(pipe, Xtrx, Ytr, cv=cv,
                         return_train_score = True,
                         scoring = 'accuracy',
                         return_estimator = True)

The full error I am getting is:

I have already tried putting the callback like this:


or putting the callback in the fit_params attribute of the cross_validate function. Nothing works for me. Someone have some suggestion?

Thank you very much


  • So finally I found a solution, actually it is more a workaround. I write it here wishing that it can be useful for some other ML practictioner. The explanation of my problem is simple and can be explained in 3 steps:

    1. sklearn do not provide a method to plot the training history of the model. I found something similar to the keras history only in the MLPclassifier that has an attribute loss_
    2. tensorflow and keras do not provide crossvalidation and pipelines routines to avoid data-leakage (since usually in deep learning there is not room for CV)
    3. wrapping a keras MLP using KerasClassifier and putting it in a sklearn pipeline is not useful because it is not possible to extrapolate the history of the classifier of the pipelin (when calling the fit function).

    So finally I used the sklearn function plot_validation_curve to create a plot of the MLP loss function in function of the training epochs. In order to avoid data-leakage I used a pipeline and the cross validation method of sklearn.

    MLP training history