Search code examples
pythonmachine-learningneural-networkpytorch

What exactly does the forward function output in Pytorch?


This example is taken verbatim from the PyTorch Documentation. Now I do have some background on Deep Learning in general and know that it should be obvious that the forward call represents a forward pass, passing through different layers and finally reaching the end, with 10 outputs in this case, then you take the output of the forward pass and compute the loss using the loss function one defined. Now, I forgot what exactly the output from the forward() pass yields me in this scenario.

I thought that the last layer in a Neural Network should be some sort of activation function like sigmoid() or softmax(), but I did not see these being defined anywhere, furthermore, when I was doing a project now, I found out that softmax() is called later on. So I just want to clarify what exactly is the outputs = net(inputs) giving me, from this link, it seems to me by default the output of a PyTorch model's forward pass is logits?

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        print(outputs)
        break
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

Solution

  • it seems to me by default the output of a PyTorch model's forward pass is logits

    As I can see from the forward pass, yes, your function is passing the raw output

        def forward(self, x):
            x = self.pool(F.relu(self.conv1(x)))
            x = self.pool(F.relu(self.conv2(x)))
            x = x.view(-1, 16 * 5 * 5)
            x = F.relu(self.fc1(x))
            x = F.relu(self.fc2(x))
            x = self.fc3(x)
            return x
    

    So, where is softmax? Right here:

    criterion = nn.CrossEntropyLoss()
    

    It's a bit masked, but inside this function is handled the softmax computation which, of course, works with the raw output of your last layer

    This is softmax calculation:

    softmax

    where z_i are the raw outputs of the neural network

    So, in conclusion, there is no activation function in your last input because it's handled by the nn.CrossEntropyLoss class

    Answering what's the raw output that comes from nn.Linear: The raw output of a neural network layer is the linear combination of the values that come from the neurons of the previous layer