Log-likelihood function in NumPy

I followed this tutorial and I was confused with the part where the author defines the negative-loglikelihood lost function.

def nll(input, target):
    return -input[range(target.shape[0]), target].mean()

loss_func = nll

Here, target.shape[0] is 64 and target is a vector with length 64

tensor([5, 0, 4, 1, 9, 2, 1, 3, 1, 4, 3, 5, 3, 6, 1, 7, 2, 8, 6, 9, 4, 0, 9, 1, 1, 2, 4, 3, 2, 7, 3, 8, 6, 9, 0, 5, 6, 0, 7, 6, 1, 8, 7, 9, 3, 9, 8, 5, 9, 3, 3, 0, 7, 4, 9, 8, 0, 9, 4, 1, 4, 4, 6, 0]).

How does that numpy indexing result in the loss function? Moreover, what should the output of a numpy array be when there are range() and another array inside the square bracket?

Solution

In the tutorial, both input and target are torch.tensor.

The negative log likelihood loss is computed as below:

nll = -(1/B) * sum(logPi_(target_class)) # for all sample_i in the batch.

Where:

B: The batch size
C: The number of classes
Pi: of shape [num_classes,] the probability vector of prediction for sample i. It is obtained by the softmax value of logit vector for sample i.
logPi: logarithm of Pi, we can simply get it by F.log_softmax(logit_i).

Let's break it down for an easy example:

input is expected as the log_softmax values, of shape [B, C].
target is expected as the ground truth classes, of shape [B, ].

For less cluttering, let's take B = 4, and C = 3.

import torch 

B, C = 4, 3

input = torch.randn(B, C)
"""
>>> input
tensor([[-0.5043,  0.9023, -0.4046],
        [-0.4370, -0.8637,  0.1674],
        [-0.5451, -0.5573,  0.0531],
        [-0.6751, -1.0447, -1.6793]])
"""

target = torch.randint(low=0, high=C, size=(B, ))
"""
>>> target
tensor([0, 2, 2, 1])
"""

# The unrolled version
nll = 0
nll += input[0][target[0]] # add -0.5043
nll += input[1][target[1]] # add -0.1674
nll += input[2][target[2]] # add  0.0531
nll += input[3][target[3]] # add -1.0447
nll *= (-1/B)
print(nll)
# tensor(0.3321)


# The compact way using numpy indexing
_nll = -input[range(0, B), target].mean()
print(_nll)
# tensor(0.3321)

Two ways of computing are similar. Hope this helps.