python-3.x deep-learning neural-network pytorch torch

Why do 'loss.backward()' and 'weight.grad' return a tensor containing all zeros?

When I run 'loss.backward()' and 'weight.grad' I get a tensor containing all zeros. Also, 'weight.grad_fn' retruns NONE.

However, it all seems to return the correct result for the second layer 'w2'. If I play with simple operations such as x*2 or x**2 'backward()' and '.grad' return correct results

Here's my code:

import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets, transforms

# Getting MNIST data
num_workers = 0
batch_size = 64
transform = transforms.ToTensor()
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
dataiter = iter(train_loader)
images, labels = dataiter.next()

#####################################
#####################################
#### NN Part

def activation(x):
    return 1/(1+torch.exp(-x))

inputs = torch.from_numpy(images.view())
# Flatten the inputs format from (64,1,28,28) into (64,784)
inputs = inputs.reshape(images.shape[0], int(images.shape[1]*images.shape[2]*images.shape[3]))


w1 = torch.randn(784, 256, requires_grad=True)# n_input, n_hidden
b1 = torch.randn(256)# n_hidden

w2 = torch.randn(256, 10, requires_grad=True)# n_hidden, n_output
b2 = torch.randn(10)# n_output

h = activation(torch.mm(inputs, w1) + b1)
y = torch.mm(h, w2) + b2

#print(h)
#print(y)

y.sum().backward()
print(w1.grad)
print(w1.grad_fn)
#print(w2.grad)
#print(w2.grad_fn)

By the way it gives me the same problem if I try to run it this way also:

images = images.reshape(images.shape[0], -1)

model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10),
                      nn.LogSoftmax(dim=1))

logits = model(images)
criterion = nn.NLLLoss()

loss = criterion(logits, labels)
print(loss)
print(loss.grad_fn)


print('Before backward pass: ', model[0].weight.grad)
loss.backward()
print('After: ', model[0].weight.grad)
#print('After: ', model[2].weight.grad)
#print('After: ', model[4].weight.grad)

Solution

The gradients of w1 are not all zero, there are simply a lot of zeros, especially around the border, because the MNIST images have a lot of black pixels (zeros). When multiplying with zero, the resulting gradients are also zero.

By printing w1.grad you only see a very small part of the values (borders), and you just can't see the non-zero values.

w1.grad
# => tensor([[0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            ...,
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.]])

# Indices of non-zero elements
w1.grad.nonzero()
# => tensor([[ 71,   0],
#            [ 71,   1],
#            [ 71,   2],
#            ...,
#            [746, 253],
#            [746, 254],
#            [746, 255]])