artificial-intelligence pytorch gradient-descent

Pytorch - Getting gradient for intermediate variables / tensors

As an exercice in pytorch framework (0.4.1) , I am trying to display the gradient of X (gX or dSdX) in a simple Linear layer (Z = X.W + B). To simplify my toy example, I backward() from a sum of Z (not a loss).

To sum up, I want gX(dSdX) of S=sum(XW+B).

The problem is that the gradient of Z (dSdZ) is None. As a result, gX is wrong too of course.

import torch
X = torch.tensor([[0.5, 0.3, 2.1], [0.2, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.1, 1.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

Result:

Z:
 tensor([[2.1500, 2.9100],
        [1.6000, 1.2600]], grad_fn=<ThAddmmBackward>)
gZ:
 None
gX:
 tensor([[ 3.6000, -0.9000,  1.3000],
        [ 3.6000, -0.9000,  1.3000]])

I have exactly the same result if I use nn.Module as below:

class Net1Linear(torch.nn.Module):
    def __init__(self, wi, wo,W,B):
        super(Net1Linear, self).__init__()
        self.linear1 = torch.nn.Linear(wi, wo)
        self.linear1.weight = torch.nn.Parameter(W.t())
        self.linear1.bias = torch.nn.Parameter(B)
    def forward(self, x):
        return self.linear1(x)
net = Net1Linear(3,2,W,B)
Z = net(X)
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

Solution

First of all you only calculate gradients for tensors where you enable the gradient by setting the requires_grad to True.

So your output is just as one would expect. You get the gradient for X.

PyTorch does not save gradients of intermediate results for performance reasons. So you will just get the gradient for those tensors you set requires_grad to True.

However you can use register_hook to extract the intermediate grad during calculation or to save it manually. Here I just save it to the grad variable of tensor Z:

import torch

# function to extract grad
def set_grad(var):
    def hook(grad):
        var.grad = grad
    return hook

X = torch.tensor([[0.5, 0.3, 2.1], [0.2, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.1, 1.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)

# register_hook for Z
Z.register_hook(set_grad(Z))

S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

This will output:

Z:
 tensor([[2.1500, 2.9100],
        [1.6000, 1.2600]], grad_fn=<ThAddmmBackward>)
gZ:
 tensor([[1., 1.],
        [1., 1.]])
gX:
 tensor([[ 3.6000, -0.9000,  1.3000],
        [ 3.6000, -0.9000,  1.3000]])

Hope this helps!

Btw.: Normally you would want the gradient to be activated for your parameters - so your weights and biases. Because what you would do right now when using the optimizer, is altering your inputs X and not your weights W and bias B. So usually gradient is activated for W and B in such a case.