Search code examples
pythonmachine-learningsoftmax

implementing softmax method in python


I'm trying to understand this code from lightaime's Github page. It is a vetorized softmax method. What confuses me is "softmax_output[range(num_train), list(y)]"

What does this expression mean?

def softmax_loss_vectorized(W, X, y, reg):


    """
    Softmax loss function, vectorize implementation
    Inputs have dimension D, there are C classes, and we operate on minibatches of N examples.

    Inputs:
        W: A numpy array of shape (D, C) containing weights.
        X: A numpy array of shape (N, D) containing a minibatch of data.
        y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C.
        reg: (float) regularization strength

    Returns a tuple of:
        loss as single float
        gradient with respect to weights W; an array of same shape as W
    """

    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)


    num_classes = W.shape[1]
    num_train = X.shape[0]
    scores = X.dot(W)
    shift_scores = scores - np.max(scores, axis = 1).reshape(-1,1)
    softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1,1)
    loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))   
    loss /= num_train 
    loss +=  0.5* reg * np.sum(W * W)

    dS = softmax_output.copy()
    dS[range(num_train), list(y)] += -1
    dW = (X.T).dot(dS)
    dW = dW/num_train + reg* W
    return loss, dW

Solution

  • This expression means: slice an array softmax_output of shape (N, C) extracting from it only values related to the training labels y.

    Two dimensional numpy.array can be sliced with two lists containing appropriate values (i.e. they should not cause an index error)

    range(num_train) creates an index for the first axis which allows to select specific values in each row with the second index - list(y). You can find it in the numpy documentation for indexing.

    The first index range_num has a length equals to the first dimension of softmax_output (= N). It points to each row of the matrix; then for each row it selects target value via corresponding value from the second part of an index - list(y).

    Example:

    softmax_output = np.array(  # dummy values, not softmax
        [[1, 2, 3], 
         [4, 5, 6],
         [7, 8, 9],
         [10, 11, 12]]
    )
    num_train = 4  # length of the array
    y = [2, 1, 0, 2]  # a labels; values for indexing along the second axis
    softmax_output[range(num_train), list(y)]
    Out:
    [3, 5, 7, 12]
    

    So, it selects third element from the first row, second from the second row etc. That's how it works.

    (p.s. Do I misunderstand you and you interested in "why", not "how"?)