How to normalize training data for different activation functions?

I'm training fully connected neural network to classify MNIST dataset. Input data are square black-and-white images with pixel values in [0, 255].

I've read that for using sigmoid() activator one needs to normalize input to be in [0, 1] (range of sigmoid).

How to normalize input data for tanh() activator? Do I need to rescale it to be in [-1, 1] (range of tanh) or it can still be in [0, 1].

Which approach is better and why? What is the general guidance?

Solution

You don't have to use a different normalization for a different activation function. In fact you don't have to normalize the input to be in [0, 1] for sigmoid. The range of sigmoid [0, 1] is the range of its output. Its range for input (domain) is in fact from minus infinity to positive infinity.

What's more is that your input does not go directly into the sigmoid function so the range of your image input is not the same as the range of input that sigmoid would get. There will be some linear layers in between which change the data range.

The general guidance is to normalize your input to be in [-1, 1]. This has nothing to be with the activation function but a general effective measure for back propagation. See Effective BackProp.