machine-learning neural-network backpropagation

In neural networks, why is the bias seen as either a "b" parameter or as an additional "wx" neuron?

In other words, what is the main reason from switching the bias to a b_j or to an additional w_ij*x_i in the neuron summation formula before the sigmoid? Performance? Which method is the best and why?

Note: j is a neuron of the actual layer and i a neuron of a lower layer.

Solution

Note: it makes little sense to ask for the best method here. Those are two different mathematical notations for exactly the same thing.

However, fitting the bias as just another weight allows you to rewrite the sum as a scalar product of an observed feature vector x_d with the weight vector w.

Have you tried to calculate the derivate w.r.t w in order to get the optimal w according to least squares? You will notice that this calculation becomes much cleaner in a vectorized notation.

Apart from that: In many high level programming languages vectorized calculations are significantly more efficient than the non-vectorized equivalent. So performance is also point, at least in some languages.