Search code examples
neural-networkbackpropagation

Neural Network Intercept Bias in this tutorial


I am going through a NN tutorial from this website:

http://peterroelants.github.io/posts/neural_network_implementation_part03/

I am confused about one particular paragraph in this page (screenshot below).

enter image description here

  1. Is the choice of the intercept bias of -1 purely arbitrary? I don't quite understand his explanation.

  2. It said in the screenshot that the RBF function maps all values to a range of [0, +infinity]. However, the RBF function only maps to a range of [0,1]. Is this a mistake? And how does this positive range lead to a choice of -1 intercept bias?


Solution

  • I'm the author of this blogpost. Thanks for bringing up these issues.

    1. Is the choice of the intercept bias of -1 purely arbitrary? I don't quite understand his explanation.

    This is arbitrary in the sense that it could be set to other bias parameters. The reason why I fixed the bias term is that I wanted a network with only 2 parameters (but a hidden layer) so I could illustrate the cost function in a 2d plot. The whole intention of this section is to illustrate that adding a hidden layer with non-linear function allows you to separate non-linearly separable data.

    There are essential 2 possible bias terms that are made constant. The first one is the bias term on the hidden layer. This hidden layer bias term will add a shift to the RBF function (yellow and green in the following figure)., by removing the bias and only adding a weight parameter we only model the 'width' of the RBF function (blue, red & pink in the following figure). RBF variations RBF function bias & weight variations (click link to play with plot)

    The second bias term is the bias on the logistic function of the output layer. This is related to how we decide to classify something as red or blue, we do this by rounding towards the nearest integer (0 or 1). By choosing this bias we could choose the decision boundary. Note in the following figure that if this bias is 0 that everything > 0.5 is classified as 1 and everything < 0.5 is classified as 0. If we have bias +1 this boundary is increased to around 0.7 and we we have bias -1 this boundary is decreased to around 0.3.

    Logistic bias Logistic bias variations (click link to play with plot)

    If we input the RBF function in the logistic function without a bias on the logistic function we notice that the the function either is completely above 0.5 or completely under 0.5 (black, blue and red graph in following image). By adding a bias we shift our output function so that part of our output function is below 0.5 and part is above 0.5 (yellow and green) and we can actually make a decision based on rounding to nearest integer (to classify as 0 or 1). By (arbitrarily) choosing bias to be -1 we can make this classification decision without putting everything in one class.

    Logistic + RBF bias RBF + logistic variations (click link to play with plot)

    Trying to explain this I do agree that the example I picked is not that clear, suggestions for a better example are always welcome.

    1. It said in the screenshot that the RBF function maps all values to a range of [0, +infinity]. However, the RBF function only maps to a range of [0,1]. Is this a mistake? And how does this positive range lead to a choice of -1 intercept bias?

    My reason here is that it doesn't matter much if its 1 or infinity, but the only thing that matters is the lower bound on 0. I agree here that I can be more technically correct and more clear, I will update it in the future.

    These blogposts are essentially a rendering of notebook experiments I did for myself. I agree they could be more clear and feedback is always welcome.