I am using the sknn package to build a neural network. In order to optimize the parameters of the neural net for the dataset I am using I am using an evolutionary algorithm. Since the package allows me to build a neural net where each layer has a different activation function, I was wondering if that is a practical choice, or whether I should just use one activation function per net? Does having multiple activation functions in a neural net harm, does no damage, or benefit the neural network?
Also what is the maximum amount of neuron per layer I should have, and the maximum amount of layers per net should I have?
A neural network is just a (big) mathematical function. You could even use different activation functions for different neurons in the same layer. Different activation functions allow for different non-linearities which might work better for solving a specific function. Using a sigmoid as opposed to a tanh will only make a marginal difference. What is more important is that the activation has a nice derivative. The reason tanh and sigmoid are usually used is that for values close to 0 they act like a linear function while for big absolute values they act more like the sign function ((-1 or 0) or 1 ) and they have a nice derivative. A relatively new introduced one is the ReLU (max(x,0)), which has a very easy derivative (except for at x=0), is non-linear but importantly is fast to compute so nice for deep networks with high training times.
What it comes down to is that for the global performance the choice in this is not very important, the non-linearity and capped range is important. To squeeze out the last percentage points this choice will matter however but is mostly dependent on your specific data. This choice just like the number of hidden layers and the number of neurons inside these layers will have to be found by crossvalidation, although you could adapt your genetic operators to include these.