machine-learning neural-network backpropagation

What exactly happens when bias units of neural networks are regularized?

I have implemented few neural networks after initially learning about them from online tutorials and all of which mention that during regularization, bias units are not taken into account but it does not result in any major differences if they are regularized.

I don't understand:

What really happens If I regularize entire weight matrix including biases ?
Does it really never produce any major difference or there are some edge cases ?

Solution

Theoretically, if you regularized your biases you would be removing some flexibility from how your network functions. Allowing biases to grow large in magnitude may allow neurons to saturate faster without responding to outlier values that amount to noise in the training data. Meanwhile, multiplying a large weight by an input value that is very atypical of the population you are studying will amplify the extent to which your network conforms itself to that outlier example, and the network won't generalize as well to held out data.

Your tutorials probably had exercises showing how regularization of weights dramatically narrows the gap between training accuracy and test/validation accuracy. The problem with regularization of biases, though, is that empirical evidence is lacking to suggest it alters network performance, even though theoretically it makes intuitive sense that regularizing biases would worsen performance. This gap between an appealing theory and what experimental evidence shows is something you will encounter more times in your study of neural networks. It means there is a lot more research to be done!

In summary, whether or not to regularize biases at this point boils down to personal preference since dramatic improvement is not seen versus unregularized biases. Remember, this is a heuristic derived from empirical observation and still lacks a convincing theoretical underpinning.