Search code examples
artificial-intelligenceneural-network

Determining Bias for Neural network Perceptrons?


This is one thing in my beginning of understand neural networks is I don't quite understand what to initially set a "bias" at? I understand the Perceptron calculates it's output based on:

P * W + b > 0

and then you could calculate a learning pattern based on b = b + [ G - O ] where G is the Correct Output, and O is the actual Output (1 or 0) to calculate a new bias...but what about an initial bias.....I don't really understand how this is calculated, or what initial value should be used besides just "guessing", is there any type of formula for this?

Pardon if Im mistaken on anything, Im still learning the whole Neural network idea before I implement my own (crappy) one.

The same goes for learning rate.....I mean most books and such just kinda "pick one" for μ.


Solution

  • The short answer is, it depends...

    1. In most cases (I believe) you can just treat the bias just like any other weight (so it might get initialised to some small random value), and it will get updated as you train your network. The idea is that all the biases and weights will end up converging on some useful set of values.

    2. However, you can also set the weights manually (with no training) to get some special behaviours: for example, you can use the bias to make a perceptron behave like a logic gate (assume binary inputs X1 and X2 are either 0 or 1, and the activation function is scaled to give an output of 0 or 1).

    OR gate: W1=1, W2=1, Bias=0

    AND gate: W1=1, W2=1, Bias=-1

    You can solve the classic XOR problem by using AND and OR as the first layer in a multilayer network, and feed them into a third perceptron with W1=3 (from the OR gate), W2=-2 (from the AND gate) and Bias=-2, like this:

    perceptron XOR

    (Note: these values will be different if your activation function is scaled to -1/+1, ie a SGN function)


    1. As to how to set the learning rate, that also depends(!) but I think usually something like 0.01 is recommended. Basically you want the system to learn as quickly as possible, but not so quickly that the weights fail to converge properly.