Search code examples
pythonnumpypytorchgaussianprobability-distribution

Getting variance values for random samples generated from a standard normal distribution using numpy


I have a function that gives me probability distributions for each class, in terms of a matrix corresponding to mean values and another matrix corresponding to variance values. For example, if I had four classes then I would have the following outputs:

y_means = [1,2,3,4]
y_variance = [0.01,0.02,0.03,0.04]

I need to do the following calculation to the mean values to continue with the rest of my program:

y_means = np.array(y_means)
y_means = np.reshape(y_means,(y_means.size,1))
A = np.random.randn(10,y_means.size)
y_means = np.matmul(A,y_means)

Here, I have used the numpy.random.randn function to generate random samples from a standard normal distribution, and then multiply this with the matrix with the mean value to obtain a new output matrix. The dimension of the output matrix would then be of the size (10 x 1).

I need to do a similar calculation such that my output_variances will also be a (10 x 1) matrix. But it is not meaningful to multiply the variances in the same way with random samples from a standard normal distribution, because this would result in negative values as well. This is undesirable because my ultimate aim would be to create a normal distribution with these mean values and their corresponding variance values using:

torch.distributions.normal.Normal(loc=y_means, scale=y_variance)

So my question is if there is any method by which I get a variance value for each random sample generated by numpy.random.randn? Because then the multplication of such a matrix would make more sense with output_variance.

Or if there is any other strategy for this that I might be unaware of, please let me know.


Solution

  • The problem mentioned in the question required another matrix of the same dimension as A that corresponded to a variance measure for the random samples present in A.

    Taking a row-wise or column-wise variance of the matrix denoted by A using numpy.var() didn't give a similar 10 x 4 matrix to multiply with y_variance.

    I had solved the above problem by using the following approach:

    First create a matrix with the same dimensions as A with zero entries, using the following line of code:

    A_var = np.zeros_like(A)
    

    then, using torch.distributions, create normal distributions with the values in A as the mean and zeroes as variance:

    dist_A = torch.distributions.normal.Normal(loc=torch.Tensor(A), scale=torch.Tensor(A_var))
    

    https://pytorch.org/docs/stable/distributions.html lists all the operations possible on Normal distributions in PyTorch. The sample() method can generate samples from a given distribution for any size. This property was exploited to first generate a sample matrix of size 10 X 10 x 4 and then calculating the variance along axis 0.

    np.var(np.array(dist2.sample((10,))),axis=0)
    

    This would result in a variance matrix of size 10 x 4, which can be used for calculations with y_variance.