python machine-learning pytorch artificial-intelligence gradient

How to check the output gradient by each layer in pytorch in my code?

I am working on the pytorch to learn.

And There is a question how to check the output gradient by each layer in my code.

My code is below

#import the nescessary libs
import numpy as np
import torch
import time

# Loading the Fashion-MNIST dataset
from torchvision import datasets, transforms

# Get GPU Device

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5,), (0.5,))
                                                                   ])
# Download and load the training data
trainset = datasets.FashionMNIST('MNIST_data/', download = True, train = True, transform = transform)
testset = datasets.FashionMNIST('MNIST_data/', download = True, train = False, transform = transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size = 32, shuffle = True, num_workers=4)
testloader = torch.utils.data.DataLoader(testset, batch_size = 32, shuffle = True, num_workers=4)

# Examine a sample
dataiter = iter(trainloader)
images, labels = dataiter.next()

# Define the network architecture
from torch import nn, optim
import torch.nn.functional as F

model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 10),
                      nn.LogSoftmax(dim = 1)
                     )
model.to(device)

# Define the loss
criterion = nn.CrossEntropyLoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr = 0.001)

# Define the epochs
epochs = 5

train_losses, test_losses = [], []

# start = time.time()
for e in range(epochs):
    running_loss = 0
    
    for images, labels in trainloader:
    # Flatten Fashion-MNIST images into a 784 long vector
        images = images.to(device)
        labels = labels.to(device)
        images = images.view(images.shape[0], -1)
        

    # Training pass
        optimizer.zero_grad()
    
        output = model.forward(images)
        loss = criterion(output, labels)
        
        loss.backward()
        
#         print(loss.grad)
        
        optimizer.step()

        running_loss += loss.item()
    
    else:
        print(model[0].grad)

If I print model[0].grad after back-propagation, Is it going to be the output gradient by each layer for every epoches?

Or, If I want to know the output gradient by each layer, where and what am I should print?

Thank you!!

Thank you for reading

Solution

Well, this is a good question if you need to know the inner computation within your model. Let me explain to you!

So firstly when you print the model variable you'll get this output:

Sequential(
  (0): Linear(in_features=784, out_features=128, bias=True)
  (1): ReLU()
  (2): Linear(in_features=128, out_features=10, bias=True)
  (3): LogSoftmax(dim=1)
)

And if you choose model[0], that means you have selected the first layer of the model. that is Linear(in_features=784, out_features=128, bias=True). If you will look at the documentation of torch.nn.Linear here, you will find that there are two variables to this class that you can access. One is Linear.weight and the other is Linear.bias which will give you the weights and biases of that corresponding layer respectively.

Remember you cannot use model.weight to look at the weights of the model as your linear layers are kept inside a container called nn.Sequential which doesn't has a weight attribute.

So coming back to looking at weights and biases, you can access them per layer. So model[0].weight and model[0].bias are the weights and biases of the first layer. And similarly to access the gradients of the first layer model[0].weight.grad and model[0].bias.grad will be the gradients.