Search code examples
pytorchlinear-algebraprobabilitymatrix-multiplicationvariance

Calculating Variance over multiple axes simultaneously


I am trying to understand how varaince over multiple axes works and can it be acheived by using individual operations. Let us we have a input 3d matrix with shape (2,2,3) and we are interested in calculating variance over axes 0,2. If we are using pytorch tensors then it can be acheived with torch.var(input,axis=(0,2)). Now I am trying to acheive this by parts and the code is given below

import torch

# Create a sample 3D tensor
input_tensor = torch.tensor([
    [[1.0, 2.0, 3.0],
     [4.0, 5.0, 6.0]],
    [[7.0, 8.0, 9.0],
     [10.0, 11.0, 12.0]]
])

#calculating variance over multiple axes
input_tensor.var(axis=(0,2),unbiased=False)

#calculating them using individual operations
((input_tensor - input_tensor.mean(axis=(0,2),keepdims=True))**2).mean((0,2))

I have got some gist of it but I would like to know how it works internally and how the matrix looks if we access multiple axes at same-time. I can able to relate them in 2D but just confused in 3D dimension especially in above mentioned axes. Also the pytorch results gets changed when we keep "unbiased=True". I am looking for possilbe explanation how it is done underhood mathematically and how it can be achieved by individual operations.


Solution

  • Calculating variance over multiple dimensions can be thought as calculating variance over a 1D vector whose components are concatenations of the inner dimension over the outer dimension. Inner being second, outer being the zero dimension, reading from right to left.

    Consider the following code as an illustration:

    import torch
    test_tensor = torch.randn((3,4,5))
    v1 = test_tensor.var(dim=[0,2], unbiased=False)[0]  # pick the first coordinate of the result
    mean = 0
    for i in range(3):
        for j in range(5):
            mean += test_tensor[i, 0, j].item()
    mean /= 15
    v2 = 0
    for i in range(3):
        for j in range(5):
            v2 += (test_tensor[i, 0, j].item() - mean)**2
    v2 /= 15
    print(v1.item(), v2)
    
    
    

    You can try it out and verify that the both ways of calculating the variance are equivalent.

    Note the order of nesting in the explicit calculation. For each pass over the second dimension, I step once over the zero dimension. Mathematically it doesn't matter for mean and variance because those functions do not depend on the order of elements so I could switch the order and pass over the zero dimension for each step over the second dimension. But for other functions, the order may matter.

    The order I specified - one step in zero dimension for each pass over the second - is consistent with what you would have gotten had you tried this:

    test_tensor[:, 0, :].flatten()
    

    Regarding biased vs. unbiased: It's about estimating the variance of some population by calculating the variance of a finite sample of that population. Suppose you can have many finite samples from the same population. For each sample, you calculate the variance. Those variances will be different, since you sample from a population instead of having the entire population.

    Now, you compute the mean of those variances you calculated. How would that mean relate to the actual variance of the original population? It they (the mean of sample variances and the actual population variance) agree, this is an unbiased estimator. If they do not agree, it's a biased estimator.

    Turns out, if you want the estimator to be unbiased, you divide the sum of square distances by n-1 rather than divide by n.

    For more details, see Bessel's correction