I came across some different error calculation functions for backpropagation: Squared error function from http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
or a nice explanation for the derivation for the BP loss function
Error = Output(i) * (1 - Output(i)) * (Target(i) - Output(i))
Now, I'm wondering how many more there are, and what difference in effect it has on training?
Also, since I understand that the second example uses the derivative of the activation function used by the layer, does the first one also does this in a way? And would it be true for any loss function (if there are more)?
Finally, how to know which one to use, and when?
This was a very broad question, but I can shed some light on the error / cost function part.
There are many different cost functions that can be applied when working with neural networks. There are no neural network specific cost functions. The most common cost function in NN is probably the Mean Squared Error (MSE) and the Cross Entropy Cost function. The latter cost function is often the most appropriate when working with logistic or softmax output layers. The MSE cost function on the other hand, is convenient since it does not require the output values to be in the range [0, 1]
.
The different cost functions excerts different convergence properties and has their own pros and cons. You'll have to read up on those that are interesting to you.
Danielle Ensign has compiled a short, nice list of cost functions over at CrossValidated.
You have confused the derivative of squared error function. The equation you've defined as the derivative of the error function, is actually the derivative of the error functions times the derivative of your output layer activation function. This multiplication calculates the delta of the output layer.
The squared error function and its derivative are defined as:
While the sigmoid activation function and its derivative are defined as:
The delta of the output layer is defined as:
And this is true for all cost functions.