python tensorflow scikit-learn conv-neural-network hyperparameters

skopt's gp_minimize() function raises ValueError: array must not contain infs or NaNs

I am currently using the skopt (scikit-optimize) package for hyperparameter tuning of a neural network (I am trying to minimize -1* accuracy). It seems to run fine (and successfully prints to the console) for several iterations before it raises Value Error: array must not contain infs or NaNs.

What are some possible causes of this? My data does not contain infs or NaNs and neither do my search parameter ranges. The neural network code is quite long, so for brevity, I will paste the relevant sections: Imports:

import pandas as pd

import numpy as np
from skopt import gp_minimize
from skopt.utils import use_named_args
from skopt.space import Real, Categorical, Integer
from tensorflow.python.framework import ops
from sklearn.model_selection import train_test_split

import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv1D, Dropout, MaxPooling1D, Flatten

from keras import backend as K

Creation of search parameters:

dim_num_filters_L1 = Integer(low=1, high=50, name='num_filters_L1')
#dim_kernel_size_L1 = Integer(low=1, high=70, name='kernel_size_L1')
dim_activation_L1 = Categorical(categories=['relu', 'linear', 'softmax'], name='activation_L1')
dim_num_filters_L2 = Integer(low=1, high=50, name='num_filters_L2')
#dim_kernel_size_L2 = Integer(low=1, high=70, name='kernel_size_L2')
dim_activation_L2 = Categorical(categories=['relu', 'linear', 'softmax'], name='activation_L2')
dim_num_dense_nodes = Integer(low=1, high=28, name='num_dense_nodes')
dim_activation_L3 = Categorical(categories=['relu', 'linear', 'softmax'], name='activation_L3')
dim_dropout_rate = Real(low = 0, high = 0.5, name = 'dropout_rate')
dim_learning_rate = Real(low=1e-4, high=1e-2, name='learning_rate')

dimensions = [dim_num_filters_L1,
              #dim_kernel_size_L1,
              dim_activation_L1,
              dim_num_filters_L2,
             #dim_kernel_size_L2,
              dim_activation_L2,
              dim_num_dense_nodes,
              dim_activation_L3,
              dim_dropout_rate,
              dim_learning_rate,
             ]

Function that creates all models that will be tested:

def create_model(num_filters_L1, #kernel_size_L1, 
                 activation_L1, 
                 num_filters_L2, #kernel_size_L2, 
                 activation_L2,
                 num_dense_nodes, activation_L3,
                 dropout_rate,
                 learning_rate):

    input_shape = (X_train.shape[1], 1)
    model = Sequential()
    model.add(Conv1D(num_filters_L1, kernel_size = 40, activation = activation_L1, input_shape = input_shape))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Conv1D(num_filters_L2, kernel_size=20, activation=activation_L2))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Flatten())
    model.add(Dense(num_dense_nodes, activation = activation_L3))
    model.add(Dropout(dropout_rate))
    model.add(Dense(y_train.shape[1], activation='linear'))
    adam = tensorflow.keras.optimizers.Adam(learning_rate = learning_rate)
    model.compile(optimizer=adam, loss='mean_squared_error', metrics=['accuracy'])

    return model

Define fitness function:

@use_named_args(dimensions=dimensions)
def fitness(num_filters_L1, #kernel_size_L1, 
                 activation_L1, 
                 num_filters_L2, #kernel_size_L2, 
                 activation_L2,
                 num_dense_nodes, activation_L3,
                 dropout_rate,
                 learning_rate):

    model = create_model(num_filters_L1, #kernel_size_L1, 
                 activation_L1, 
                 num_filters_L2, #kernel_size_L2, 
                 activation_L2,
                 num_dense_nodes, activation_L3,
                 dropout_rate,
                 learning_rate)

    history_opt = model.fit(x=X_train,
                        y=y_train,
                        validation_data=(X_val,y_val), 
                        shuffle=True, 
                        verbose=2,
                        epochs=10
                        )

    #return the validation accuracy for the last epoch.
    accuracy_opt = model.evaluate(X_test,y_test)[1]

    # Print the classification accuracy:
    print("Experimental Model Accuracy: {0:.2%}".format(accuracy_opt))

    # Delete the Keras model with these hyper-parameters from memory:
    del model

    # Clear the Keras session, otherwise it will keep adding new models to the same TensorFlow graph each time we create model with a different set of hyper-parameters.
    K.clear_session()
    ops.reset_default_graph()

    # the optimizer aims for the lowest score, so return negative accuracy:
    return -accuracy # or sum(RMSE)?

Run hyperparameter search:

gp_result = gp_minimize(func=fitness,
                            dimensions=dimensions)

print("best accuracy was " + str(round(gp_result.fun *-100,2))+"%.")

Solution

Your activation function is not converging in a random acquisition function call. I encountered this problem and removed 'relu' function from search space.