python tensorflow machine-learning keras hyperparameters

How to determine optimal number of layers and activation function(s)

So I am working on the MNIST and Boston_Housing datasets using keras, and I was wondering how I would determine the optimal number of layers and activation functions for each layer. Now, I am not asking what the optimal number of layers/activation functions are, but rather the process I should go through to determine these parameters.

I am evaluating my model using mean squared error and mean absolute error. Here is what my current model looks like:

    model = models.Sequential()
    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dense(64, init='glorot_uniform', activation=layers.Activation('selu')))
    model.add(layers.Dense(64,activation = 'softplus'))

model.add(layers.Dense(1))
model.compile(optimizer = 'rmsprop', 
                loss='mse',
                metrics=['mae'])

I have a mean squared error of 3.5 and a mean squared error of 27.

Solution

For choosing the activation function,

Modern neural networks mainly use ReLU or leakyReLU in the hidden layers
For classification, a softmax activation is used at the output layer.
For regression, a linear activation is used at the output layer.

For choosing the number of layers,

Totally depends on your problem.
More layers are helpful, when the data is complex as they could approximate the function between the input and output efficiently.
Sometimes, for smaller problems l, like MNIST, even a net with 2 hidden layers would work well.