Search code examples
neural-networkdatasetartificial-intelligencenormalizationsigmoid

Dataset values distribution for sigmoid and tanh


As many papers point out, for better learning curve of NN, it's better for dataset to be normalized in a way that values match Gaussian curve.

Does this apply only if we use sigmoid function as squashing function? If not what deviation is best for tanh squashing function?


Solution

  • Does this apply only if we use sigmoid function as squashing function?

    No, activation distribution obviously depends on the activation function, that's why, in particular, the initialization techniques are different for sigmoid and relu based neural networks. See the difference between Xavier and He initialization in this question. The same is true for the input distribution.

    If not what diviation is best for tanh squashing function?

    But tanh is a scaled and shifted sigmoid:

    tanh(x) = 2⋅sigmoid(2x) - 1
    

    So if the activations are normally distributed for the sigmoid activation, they will still be normally distributed for the tanh. Only with a scaled standard deviation and a shifted mean. So the same input distribution works ok for tanh. If you'd prefer to get the same Gaussian variance, you can scale the input by sqrt(2), but it's really not that significant.