Search code examples
pytorchautomatic-differentiation

Why is the Jacobian of a 4D matrix calculated incorrectly by PyTorch's torch.autograd.functional.jacobian?


(Pardon the images, stackoverflow does not allow for native math support, so pasting in images of rendered LaTeX equations is the best I can do)

I am doing some tests on calculating the Jacobian of various matrix-valued functions using PyTorch. First, I conducted tests on a 2D matrix-valued function:

Equation 1

Analytically its Jacobian would be:

Equation 2

In code:

def matrix_f(coords):
    x = coords[0]
    y = coords[1]
    g = torch.zeros(2, 2)
    g[0][0] = x
    g[1][1] = y
    return g

def calculate_jacob(matrix, x, y):
    coord = torch.tensor([x, y], requires_grad=True)
    jacobian = torch.autograd.functional.jacobian(matrix, coord)
    return jacobian

calculate_jacob(matrix_f, 2.0, 2.0)

This gives the correct output, which matches the analytical solution:

tensor([[[1., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 1.]]])

However, if I use a 4D matrix-valued function, such as the Minkowski metric in spherical coordinates, I seem to run into issues. Given the spherical Minkowski metric:

Equation 3

The analytical Jacobian would be:

Equation 4

However, if I run the code for computing the Jacobian:

def minkowski_metric_spherical(coords):
    t = coords[0]
    r = coords[1]
    theta = coords[2]
    phi = coords[3]
    g = torch.zeros(4, 4)
    g[0][0] = -1
    g[1][1] = 1
    g[2][2] = r ** 2
    g[3][3] = r ** 2 * torch.sin(theta) ** 2
    return g

def test_minkowski_jacob(metric, t, r, theta, phi):
    coord = torch.tensor([t, r, theta, phi], requires_grad=True)
    jacobian = torch.autograd.functional.jacobian(metric, coord)
    return jacobian

test_minkowski_jacob(minkowski_metric_spherical, 0.0, 1.0, np.pi / 4, 0.0)

The result does not match the analytical solution:

tensor([[[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 2.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 1.0000, 1.0000, 0.0000]]])

Notably, the nonzero terms in the analytical Jacobian are not in the right positions, and the last 2D matrix in the PyTorch Jacobian is nonzero, while the last 2D matrix in the analytical Jacobian is all zero.

What is the issue?


Solution

  • There are several things going on here:

    First, note that your analytical jacobian is incorrect: Your matrix valued function is a function from 2D real vectors to 4x4 matrices, whose dimension is 16, not 4:

    enter image description here

    definition

    Now,

    your definition of the jacobian is correct: it is a vector of function derivatives, where each element is the derivative by a different variable. Since your function is a function of just 2 variables, you would expect it to be a vector of two elements, not four as the vector that you provided:

    jacobian

    In terms of your code,

    You derive your function of two variables by four variables:

        coord = torch.tensor([t, r, theta, phi], requires_grad=True)
        jacobian = torch.autograd.functional.jacobian(metric, coord)
    

    Note that your function g:

        g = torch.zeros(4, 4)
        g[0][0] = -1
        g[1][1] = 1
        g[2][2] = r ** 2
        g[3][3] = r ** 2 * torch.sin(theta) ** 2
    

    does not depend on t and on phi, which also matches the analytic discussion. Well, what happens when you derive your function by variables on which it does not depend? Nothing special. You just get zeros. Much like if you had the simple function f(x) = x^2, were you to derive it by t you would get a 0 because x^2 is a constant with respect to t.

    Similarly in the matrix case, you get a 4x4 zero matrix each time you derive the metric by a variable on which is does not depend, such as phi or t.

    Finally, note the shape of your jacobian tensor:

    import torch
    import numpy as np
    def minkowski_metric_spherical(coords):
        t = coords[0]
        r = coords[1]
        theta = coords[2]
        phi = coords[3]
        g = torch.zeros(4, 4)
        g[0][0] = -1
        g[1][1] = 1
        g[2][2] = r ** 2
        g[3][3] = r ** 2 * torch.sin(theta) ** 2
        return g
    
    def test_minkowski_jacob(metric, t, r, theta, phi):
        coord = torch.tensor([t, r, theta, phi], requires_grad=True)
        jacobian = torch.autograd.functional.jacobian(metric, coord)
        return jacobian
    
    j = test_minkowski_jacob(minkowski_metric_spherical, 0.0, 1.0, np.pi / 4, 0.0)
    print(j.shape)
    

    The shape is 4 x 4 x 4. This is expected: You derived your function by 4 variables so you expect four elements. Since your function outputs a 4x4 matrix, you expect each element to be a 4x4 matrix. In total, 4x4x4 values.

    The last piece of the puzzle is that the variables are indexed in the last dimension. That is, j[:, :, 0] is the derivative by t, j[:, :, 1] is the derivative by r, j[:, :, 2] is the derivative by theta and j[:, :, 3] is the derivative by phi

    Let's verify:

    j = test_minkowski_jacob(minkowski_metric_spherical, 0.0, 1.0, np.pi / 4, 0.0)
    print(j.shape)
    for i in range(4):
        print(j[:,:,i])
    

    Which gives :

    tensor([[0., 0., 0., 0.],
            [0., 0., 0., 0.],
            [0., 0., 0., 0.],
            [0., 0., 0., 0.]])
    
    
    tensor([[0.0000, 0.0000, 0.0000, 0.0000],
            [0.0000, 0.0000, 0.0000, 0.0000],
            [0.0000, 0.0000, 2.0000, 0.0000],
            [0.0000, 0.0000, 0.0000, 1.0000]])
    
    
    tensor([[0.0000, 0.0000, 0.0000, 0.0000],
            [0.0000, 0.0000, 0.0000, 0.0000],
            [0.0000, 0.0000, 0.0000, 0.0000],
            [0.0000, 0.0000, 0.0000, 1.0000]])
    
    
    tensor([[0., 0., 0., 0.],
            [0., 0., 0., 0.],
            [0., 0., 0., 0.],
            [0., 0., 0., 0.]])
    

    Which matches the analytic discussion precisely.