Discrepancy between log_prob and manual calculation

I want to define multivariate normal distribution with mean [1, 1, 1] and variance covariance matrix with 0.3 on diagonal. After that I want to calculate log likelihood on datapoints [2, 3, 4]

By torch distributions

import torch
import torch.distributions as td

input_x = torch.tensor([2, 3, 4])
loc = torch.ones(3)
scale = torch.eye(3) * 0.3
mvn = td.MultivariateNormal(loc = loc, scale_tril=scale)
mvn.log_prob(input_x)
tensor(-76.9227)

From scratch

By using formula for log likelihood:

We obtain tensor:

first_term = (2 * np.pi* 0.3)**(3)
first_term = -np.log(np.sqrt(first_term))
x_center = input_x - loc
tmp = torch.matmul(x_center, scale.inverse())
tmp = -1/2 * torch.matmul(tmp, x_center)
first_term + tmp 
tensor(-24.2842)

where I used fact that

My question is - what's the source of this discrepancy?

Solution

You are passing the covariance matrix to the scale_tril instead of covariance_matrix. From the docs of PyTorch's Multivariate Normal

scale_tril (Tensor) – lower-triangular factor of covariance, with positive-valued diagonal

So, replacing scale_tril with covariance_matrix would yield the same results as your manual attempt.

In [1]: mvn = td.MultivariateNormal(loc = loc, covariance_matrix=scale)
In [2]: mvn.log_prob(input_x)
Out[2]: tensor(-24.2842)

However, it's more efficient to use scale_tril according to the authors:

...Using scale_tril will be more efficient:

You can calculate the lower choelsky using torch.linalg.cholesky

In [3]: mvn = td.MultivariateNormal(loc = loc, scale_tril=torch.linalg.cholesky(scale))
In [4]: mvn.log_prob(input_x)
Out[4]: tensor(-24.2842)