I'm trying to create a sample neural network that can be used for credit scoring. Since this is a complicated structure for me, i'm trying to learn them small first.
I created a network using back propagation - input layer (2 nodes), 1 hidden layer (2 nodes +1 bias), output layer (1 node), which makes use of sigmoid as activation function for all layers. I'm trying to test it first using a^2+b2^2=c^2 which means my input would be a and b, and the target output would be c.
My problem is that my input and target output values are real numbers which can range from (-/infty, +/infty). So when I'm passing these values to my network, my error function would be something like (target- network output). Would that be correct or accurate? In the sense that I'm getting the difference between the network output (which is ranged from 0 to 1) and the target output (which is a large number).
I've read that the solution would be to normalise first, but I'm not really sure how to do this. Should i normalise both the input and target output values before feeding them to the network? What normalisation function is best to use cause I read different methods in normalising. After getting the optimized weights and use them to test some data, Im getting an output value between 0 and 1 because of the sigmoid function. Should i revert the computed values to the un-normalized/original form/value? Or should i only normalise the target output and not the input values? This really got me stuck for weeks as I'm not getting the desired outcome and not sure how to incorporate the normalisation idea in my training algorithm and testing..
Thank you very much!!
So to answer your questions :
tanh
or relu
nolinearities. The detailed explaination (as well as some useful tips if you want to keep sigmoid as your activation) might be found here.age
and income
. First one varies from e.g. 5 to 90. Second one varies from e.g. 1000 to 100000. The mean absolute value is much bigger for income
than for age
so due to linear tranformations in your model - ANN is treating income
as more important at the beginning of your training (because of random initialization). Now consider that you are trying to solve a task where you need to classify if a person given has grey hair :) Is income
truly more important variable for this task?rules of thumb
on how you should normalize your input data. One is to squash all inputs to [0, 1]
interval. Another is to make every variable to have mean = 0
and sd = 1
. I usually use second method when the distribiution of a given variable is similiar to Normal Distribiution and first - in other cases.