I have a 2D means matrix in size n*m
, where n
is number of samples and m
is the dimension of the data.
I have as well n
matrices of m*m
, namely sigma is my variance matrix in shape n*m*m
.
I wish to sample n
samples from a the distributions above, such that x_i~N(mean[i], sigma[i])
.
Any way to do that in numpy
or any other standard lib w/o running with a for loop?
The only option I thought was using np.random.multivariate_normal()
by flatting the means matrix to one vector, and flatten the 3D sigma to a 2D blocks-diagonal matrix. And of course reshape afterwards. But that means we are going the sample with sigma in shape (n*m)*(n*m)
which can easily be ridiculously huge, and only computing and allocating that matrix (if possible) can take longer than running in a for loop.
In my specific task, right now Sigma is the same matrix for all the samples, means I can express Sigma in m*m
, and it is the same one for all n points. But I am interested in a general solution.
Appreciate your help.
Difficult to tell without testable code, but this should be close:
A = numpy.linalg.cholesky(sigma) # => shape (n, m, m), same as sigma
Z = np.random.normal(size = (n, m)) # shape (n, m)
X = np.einsum('ijk, ik -> ij', A, Z) + mean # shape (n, m)
What's going on:
We're manually sampling multivariate normal distributions according to the standard Cholesky decomposition method outlined here. A
is built such that [email protected] = sigma
. Then X
(the multivariate normal) can be formed by the dot product of A
and a univariate normal N(0, 1) vector Z
, plus the mean
.
You keep the extraneous dimension throughout the calculation in the first (index = 0, 'i'
in the einsum
) axis, while contracting the last ('k'
) axis, forming the dot product.