Optimize hyperparameters hidden_layer_size MLPClassifier with skopt

How can I optimize the number of layers and hidden layer size in a neural network using MLPClassifier from sklearn and skopt?

Usually I'd specify my space something like:

Space([Integer(name = 'alpha_2', low = 1, high = 2),
       Real(10**-5, 10**0, "log-uniform", name='alpha_2')])

( let's say hyperparameters alpha_1 and alpha_2).

With the neural network implementation in sklearn I need to tune hidden_layer_sizes which is a tuple:

 hidden_layer_sizes : tuple, length = n_layers - 2, default=(100,)
     The ith element represents the number of neurons in the ith
     hidden layer.

How can I represent this in Space?

Solution

If you are using gp_minimize you can include the number of hidden layers and the neurons per layer as parameters in Space. Inside the definition of the objective function you can manually create the hyperparameter hidden_layer_sizes.

This is an example from the scikit-optimize homepage, now using an MLPRegressor:

import numpy as np
from sklearn.datasets import load_boston
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import cross_val_score
from skopt.space import Real, Integer, Categorical 
from skopt.utils import use_named_args
from skopt import gp_minimize

boston = load_boston()
X, y = boston.data, boston.target
n_features = X.shape[1]

reg = MLPRegressor(random_state=0)

space=[
    Categorical(['tanh','relu'],name='activation'),
    Integer(1,4,name='n_hidden_layer'),
    Integer(200,2000,name='n_neurons_per_layer')]

@use_named_args(space)

def objective(**params):
    n_neurons=params['n_neurons_per_layer']
    n_layers=params['n_hidden_layer']

    # create the hidden layers as a tuple with length n_layers and n_neurons per layer
    params['hidden_layer_sizes']=(n_neurons,)*n_layers

    # the parameters are deleted to avoid an error from the MLPRegressor
    params.pop('n_neurons_per_layer')
    params.pop('n_hidden_layer')

    reg.set_params(**params)

    return -np.mean(cross_val_score(reg, X, y, cv=5, n_jobs=-1,
                                    scoring="neg_mean_absolute_error"))

res_gp = gp_minimize(objective, space, n_calls=50, random_state=0)