I'm trying to understand this code from lightaime's Github page. It is a vetorized softmax method. What confuses me is "softmax_output[range(num_train), list(y)]"
What does this expression mean?
def softmax_loss_vectorized(W, X, y, reg):
"""
Softmax loss function, vectorize implementation
Inputs have dimension D, there are C classes, and we operate on minibatches of N examples.
Inputs:
W: A numpy array of shape (D, C) containing weights.
X: A numpy array of shape (N, D) containing a minibatch of data.
y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C.
reg: (float) regularization strength
Returns a tuple of:
loss as single float
gradient with respect to weights W; an array of same shape as W
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
num_classes = W.shape[1]
num_train = X.shape[0]
scores = X.dot(W)
shift_scores = scores - np.max(scores, axis = 1).reshape(-1,1)
softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1,1)
loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))
loss /= num_train
loss += 0.5* reg * np.sum(W * W)
dS = softmax_output.copy()
dS[range(num_train), list(y)] += -1
dW = (X.T).dot(dS)
dW = dW/num_train + reg* W
return loss, dW
This expression means: slice an array softmax_output
of shape (N, C)
extracting from it only values related to the training labels y
.
Two dimensional numpy.array
can be sliced with two lists containing appropriate values (i.e. they should not cause an index error)
range(num_train)
creates an index for the first axis which allows to select specific values in each row with the second index - list(y)
. You can find it in the numpy documentation for indexing.
The first index range_num has a length equals to the first dimension of softmax_output
(= N
). It points to each row of the matrix; then for each row it selects target value via corresponding value from the second part of an index - list(y)
.
Example:
softmax_output = np.array( # dummy values, not softmax
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]]
)
num_train = 4 # length of the array
y = [2, 1, 0, 2] # a labels; values for indexing along the second axis
softmax_output[range(num_train), list(y)]
Out:
[3, 5, 7, 12]
So, it selects third element from the first row, second from the second row etc. That's how it works.
(p.s. Do I misunderstand you and you interested in "why", not "how"?)