machine-learning deep-learning neural-network conv-neural-network bias-neuron

When to use / not use bias term in convolutional neural networks

This question has recently popped up in my mind. I've asked GPT and a couple of other models about the importance of bias term in convolutional networks. All of them responded differently and very superficially. I also occasionally see kaggle notebooks, where people set 'bias=False' or 'bias=True' in conv / dense layers, when train their models. Can you share insights about why bias term might be important and when to consider enabling / disabling it? Thanks.

Solution

One thing to bear in mind is that for many popular choices of activation function (EG Relu), in any neuron that doesn't have a bias, an input value of zero will map to an output value of zero. Likewise, if your whole network uses such activation functions (without normalisation), the same applies: zero inputs get mapped to zero outputs, so dark pixels (with value zero) will map to zero, and effectively behave linearly. If you want all pixels to behave non linearly (which is generally true for neural networks), one solution is to use biases.

The situation is slightly different for transformers: they often don't use biases, partly because they use frequent Layer Normalisation layers, which effectively add their own biases.

But in some cases, EG SWIN transformers, the size of the attention map is always known (equal to window size), and they add a learned positional bias directly to the attention maps.