Backpropagation overall error chart with very small slope... Is this normal?

I'm training a neural network with backpropagation algorithm and this is the chart of Overall Errors:

enter image description here

( I'm calculating Overall error by this formula : http://www.colinfahey.com/neural_network_with_back_propagation_learning/neural_network_with_back_propagation_learning_en.html Part 6.3 : Overall training error)

I have used Power Trendline and after calculations, I saw that if epoches = 13000 => overall error = 0.2

Isn't this too high?

Is this chart normal? Seems that the training process will take too long... Right? What should I do? Isn't there any faster way?

EDIT : My neural network has a hidden layer with 200 neurons. and my input and output layers have 10-12 neurons. My problem is clustering characters. (it clusters Persian characters into some clusters with supervised training)

Solution

So you are using a ANN with 200 input nodes with 10-12 hidden nodes in the hidden layer, what activation function are you using if any for your hidden layer and output layer?

Is this a standard back propagation training algorithm and what training function are you using? Each type of training function will affect the speed of training and in some cases its ability to generalise, you don't want to train against your data such that your neural network is only good for your training data.

So ideally you want decent training data that could be a sub sample of your real data say 15%. You could training your data using conjugate gradient based algorithm: http://www.mathworks.co.uk/help/toolbox/nnet/ug/bss331l-1.html#bss331l-2 this will train your network quickly.

10-12 nodes may not be ideal for your data, you can try changing the number in blocks of 5 or add another layer, in general more layers will improve the ability of your network to classify your problem but will increase the computational complexity and hence slow down the training.

Presumably these 10-12 nodes are 'features' you are trying to classify?

If so, you may wish to normalise them, so rescale each to between 0 and 1 or -1 to 1 depending on your activation function (e.g. tan sigmoidal will produce values in range -1 to +1): http://www.heatonresearch.com/node/706

You may also train a neural network to identify the ideal number of nodes you should have in your hidden layer.