I'm making ANN from a tutorial. In the tutorial, the sigmoid and dsigmoid are as following:
sigmoid(x) = tanh(x)
dsigmoid(x) = 1-x*x
However, by definition, dsignmoid is derivative of sigmoid function, thus it should be (http://www.derivative-calculator.net/#expr=tanh%28x%29):
dsigmoid(x) = sech(x)*sech(x)
When using 1-x*x, the training does converge, but when I use the mathematically correct derivate, ie. sech squared, the training process doesn't converge.
The question is why 1-x*x works (model trained to correct weights), and the mathematical derivative sech2(x) doesn't (model obtained after max number of iterations holds wrong weights)?
In the first set of formulas, the derivative is expressed as function of the function value, that is
tanh'(x) = 1-tanh(x)^2 = dsigmoid(sigmoid(f))
As that is probably used and implemented in the existing code that way, you will get the wrong derivative if you replace that with the "right" formula.