I am following this tutorial on NN and backpropagation.
I am new to python and I am trying to convert the code to MATLAB. Can someone kindly explain the following code line (from the tutorial) :
delta3[range(num_examples), y] -= 1
In short, and if I am not mistaken, delta3
and y
are vectors and num_examples
is an integer.
Ii is my understanding that delta3=probs-y
as in this math exchange entry(Thank you @rayryeng). Why and when should I subtract 1?
Otherwise can anybody direct me to an online site I can simply run and follow the code? I was getting errors everywhere I tried to run (including my home PC):
"NameError: name 'sklearn' is not defined" (probably an import I am missing)
This line: delta3[range(num_examples), y] -= 1
is part of calculating the gradient of the softmax loss function. I refer you to this nice link that gives you more information on how this loss function is formulated and the intuition behind it: http://peterroelants.github.io/posts/neural_network_implementation_intermezzo02/.
In addition, I refer you to this post on Mathematics Stack Exchange that shows you how the gradient of the softmax loss is derived: https://math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function. Consider the first link as a deep dive whereas the second link is a tl;dr
of the first link.
The gradient of the softmax loss function is the gradient of the output layer which you would need to propagate backwards into the layer before the output layer to continue with the backpropagation algorithm.
Summarizing the post I have linked above, if you calculate the gradient of the softmax loss for a training example, for each class the gradient of the loss is simply the softmax value evaluated for that class. You additionally need to subtract the loss value by 1 for the class the actual training example belongs to. Remember that the gradient of an example for a class i
is equal to p_i - y_i
where p_i
is the softmax score of class i
for the example and y_i
is the classification label using a one-hot encoding scheme. Specifically y_i = 0
if i
is not the true class of the example and y_i = 1
if it is. delta3
contains the gradient of the softmax loss function per example in your mini-batch. Specifically, it is a 2D matrix where the total number of rows is equal to the number of training examples, or num_examples
while the number of columns is the total number of classes.
Firstly we calculate the softmax scores for each training example and for each class. Next for each row of the gradient, we determine the column location that corresponds to the true class the example belongs to and we subtract the scores by 1. range(num_examples)
would generate a list from 0
up to num_examples - 1
and y
contains the true class labels per example. Therefore, for each pair of range(num_examples)
and y
, this accesses the right row and column location to subtract 1 by to finalize the gradient of the loss function.
Now in the Mathematics Stack Exchange post as well as your understanding, the gradient is delta3 = probs - y
. This assumes that y
is a one-hot encoded matrix, meaning that y
has the same size as probs
and for each row of y
it is all zero except for the column index that contains the correct class which is set to 1. Therefore if you think about it correctly, if you generated a matrix y
where for each row the columns are all zero except for the class number that example belongs to, it is equivalent to simply accessing the right column for each row and subtracting the score by 1.
In MATLAB you actually need to create the linear indices in order to facilitate this subtraction. Specifically, you need to use sub2ind
to convert these row and column locations to linear indices, then we can access the gradient matrix and subtract the values by 1.
ind = sub2ind(size(delta3), 1 : num_examples, y + 1);
delta3(ind) = delta3(ind) - 1;
In the Python tutorial you have linked, the class labels are assumed to be from 0
up to N-1
where N
is the total number of classes. You must be careful in MATLAB where we start indexing arrays starting at 1
, so I have added 1
to y
in the above code to ensure that your labels start at 1
instead of 0
. ind
contains the linear indices of the row and column locations that we need to access and we thus complete the subtraction using those indices.
If you were to formulate this using the knowledge that you gained from your edit, you would do this instead:
ymatrix = full(sparse(1 : num_examples, y + 1, 1, size(delta3, 1), size(delta3, 2));
delta3 = probs - ymatrix;
contains the matrix that I talked about where each row corresponds to an example with all zeroes except for the column that pertains to the class the example belongs to, which is 1. What you may have not seen before is the sparse
and full
functions. sparse
allows you to create a zero matrix and you can specify the row and column locations that are non-zero as well as the values that each of these locations take on. In this case, I'm exactly accessing one element per row and using the class ID for the example to access the columns and setting each of these locations to 1. Also remember that I'm adding by 1 as I'm assuming your class IDs start from 0. Because this is a sparse
matrix, I then convert this to full
to give you a numeric matrix rather than representing it in sparse
form. Therefore, this code is equivalent in operation to the previous code snippet I showed. However, it is more efficient to do it the first way as you are not creating an additional matrix to facilitate the gradient computation. You are modifying the gradient in place instead.
As a sidenote, sklearn
is the scikit-learn Python machine learning package, and the NameError
is in reference to you not having the actual package installed. To install it, use pip
or easy_install
to install the Python package to your computer.... so in your command line, it's as simple as:
pip install sklearn
easy_install sklearn
However, scikit-learn should not be required for you to run the above subtraction code. You do need NumPy though so make sure you have that package installed.
For pip
pip install numpy
... and for easy_install
easy_install numpy