Search code examples
mathmachine-learningartificial-intelligencederivativesigmoid

Sigmoid function and derivative of sigmoid function in ANN


I'm making ANN from a tutorial. In the tutorial, the sigmoid and dsigmoid are as following:

sigmoid(x) = tanh(x)

dsigmoid(x) = 1-x*x

However, by definition, dsignmoid is derivative of sigmoid function, thus it should be (http://www.derivative-calculator.net/#expr=tanh%28x%29):

dsigmoid(x) = sech(x)*sech(x)

When using 1-x*x, the training does converge, but when I use the mathematically correct derivate, ie. sech squared, the training process doesn't converge.

The question is why 1-x*x works (model trained to correct weights), and the mathematical derivative sech2(x) doesn't (model obtained after max number of iterations holds wrong weights)?


Solution

  • In the first set of formulas, the derivative is expressed as function of the function value, that is

    tanh'(x) = 1-tanh(x)^2 = dsigmoid(sigmoid(f))
    

    As that is probably used and implemented in the existing code that way, you will get the wrong derivative if you replace that with the "right" formula.