So I am working on the MNIST and Boston_Housing datasets using keras, and I was wondering how I would determine the optimal number of layers and activation functions for each layer. Now, I am not asking what the optimal number of layers/activation functions are, but rather the process I should go through to determine these parameters.
I am evaluating my model using mean squared error and mean absolute error. Here is what my current model looks like:
model = models.Sequential()
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(64, init='glorot_uniform', activation=layers.Activation('selu')))
model.add(layers.Dense(64,activation = 'softplus'))
model.add(layers.Dense(1))
model.compile(optimizer = 'rmsprop',
loss='mse',
metrics=['mae'])
I have a mean squared error of 3.5 and a mean squared error of 27.
For choosing the activation function,
For choosing the number of layers,