Search code examples
pythonnumpymatrixdistributionnormal-distribution

Sample more than one element from multivariable normal distribution


I have a 2D means matrix in size n*m, where n is number of samples and m is the dimension of the data. I have as well n matrices of m*m, namely sigma is my variance matrix in shape n*m*m. I wish to sample n samples from a the distributions above, such that x_i~N(mean[i], sigma[i]). Any way to do that in numpy or any other standard lib w/o running with a for loop?

The only option I thought was using np.random.multivariate_normal() by flatting the means matrix to one vector, and flatten the 3D sigma to a 2D blocks-diagonal matrix. And of course reshape afterwards. But that means we are going the sample with sigma in shape (n*m)*(n*m) which can easily be ridiculously huge, and only computing and allocating that matrix (if possible) can take longer than running in a for loop.

In my specific task, right now Sigma is the same matrix for all the samples, means I can express Sigma in m*m, and it is the same one for all n points. But I am interested in a general solution.

Appreciate your help.


Solution

  • Difficult to tell without testable code, but this should be close:

    A = numpy.linalg.cholesky(sigma)                # => shape (n, m, m), same as sigma
    Z = np.random.normal(size = (n, m))             # shape (n, m)
    X = np.einsum('ijk, ik -> ij', A, Z) + mean     # shape (n, m)
    

    What's going on:

    We're manually sampling multivariate normal distributions according to the standard Cholesky decomposition method outlined here. A is built such that [email protected] = sigma. Then X (the multivariate normal) can be formed by the dot product of A and a univariate normal N(0, 1) vector Z, plus the mean.

    You keep the extraneous dimension throughout the calculation in the first (index = 0, 'i' in the einsum) axis, while contracting the last ('k') axis, forming the dot product.