Search code examples
neural-networkbackpropagation

Why do we "unroll" thetas in neural network back propagation?


In backpropagation implementation, it seems like a norm to unroll (make the thetas as an one-dimensional vectors) thetas and then pass them as a parameter to the cost function.

To illustrate (I asuume 3 layer NN case):

def NNCostFunction(unrolled_thetas, input_layer_size, hidden_layer_size, num_labels, X, y):

    # **ROLL AGAIN** unrolled_thetas to theta1, theta2 (3 layer assumption)
    # Forward propagate to calculate the cost
    # Then Back propagate to calculate the delta

    return cost, gradient_theta1, gradient_theta2

What makes me wonder is: Why do we pass unrolled thetas to the function and then roll it again (form the original shape of thetas) inside of the function? Why don't we just pass original thetas to the cost function?

I think I'm not grasping something important here. There must be a reason why we are doing this. Is it because the optimization implementation in most languages only take theta as a vector? Please shed some light on my understanding! Thank you.


Solution

  • I figured out. Unrolling was NOT backpropagation specific.

    In order to use an of-the-shelf minimizer such as fmincg, the cost function has been set up to unroll the parameters into a single vector params!