Different loss functions for backpropagation

I came across some different error calculation functions for backpropagation: Squared error function from http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

or a nice explanation for the derivation for the BP loss function

Error = Output(i) * (1 - Output(i)) * (Target(i) - Output(i))

Now, I'm wondering how many more there are, and what difference in effect it has on training?

Also, since I understand that the second example uses the derivative of the activation function used by the layer, does the first one also does this in a way? And would it be true for any loss function (if there are more)?

Finally, how to know which one to use, and when?

Solution

This was a very broad question, but I can shed some light on the error / cost function part.

Cost functions

There are many different cost functions that can be applied when working with neural networks. There are no neural network specific cost functions. The most common cost function in NN is probably the Mean Squared Error (MSE) and the Cross Entropy Cost function. The latter cost function is often the most appropriate when working with logistic or softmax output layers. The MSE cost function on the other hand, is convenient since it does not require the output values to be in the range [0, 1].

The different cost functions excerts different convergence properties and has their own pros and cons. You'll have to read up on those that are interesting to you.

List of cost functions

Danielle Ensign has compiled a short, nice list of cost functions over at CrossValidated.

Sidenote

You have confused the derivative of squared error function. The equation you've defined as the derivative of the error function, is actually the derivative of the error functions times the derivative of your output layer activation function. This multiplication calculates the delta of the output layer.

The squared error function and its derivative are defined as:

While the sigmoid activation function and its derivative are defined as:

The delta of the output layer is defined as:

And this is true for all cost functions.