Search code examples
pythonperformancescipynormal-distribution

Python: Calculate likelihood of a value for N number of multivariate normal distributions


So I have a a set of N multivariate normal distributions, which all have the same covariance. For each of these distributions, I want to calculate the likelihood of getting the value x.

For a single distribution, and multiple number of "x" values, this is trivial

from scipy.stats import multivariate_normal
import numpy as np

cov = [[1 ,0.1],[0.1 ,1]]
mean = [0,0]
Values = np.random.multivariate_normal([0,0],cov,samp)
print  multivariate_normal.pdf(Values, mean, cov)

Now, if we reverse this, and assume we only have one Value to check for, but multiple means but same covariance each time. Like the following (Of course in the real case, the mean is different in each iteration)

means = [mean]*samples
Value = Values[0,:]

L = []
for iMean in means:
    L.append(multivariate_normal.pdf(Value, iMean, cov))

print L

Is there a better way of doing this? If there is any difference, then assuming that the covariance matrix is uncorrelated is also allowed, although a general solution is preferable.


Solution

  • You can first calculate the squared mahalanobis distance for all distributions. https://en.wikipedia.org/wiki/Mahalanobis_distance

    Afterwards you calculate the probablilty density.

    *https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.multivariate_normal.html *https://en.wikipedia.org/wiki/Multivariate_normal_distribution

    By using numpy arrays you can avoid slow python loops. I added this to your example:

    from scipy.stats import multivariate_normal
    import numpy as np
    
    cov = [[1 ,0.5],[0.5 ,1]]
    mean = [2,2]
    
    samples = 10
    means = [mean]*samples
    
    Value = (3,2.5)
    
    L = []
    for iMean in means:
        L.append(multivariate_normal.pdf(Value, iMean, cov))
    
    
    
    mean_array = np.array(means)
    value_array = np.array(Value).astype(np.float)
    cov_array = np.array(cov)
    inv_cov_array = np.linalg.inv(cov_array)
    dim = cov_array.shape[0]
    
    diffs = value_array-mean_array
    maha_distances = np.sum(diffs.transpose()*np.dot(inv_cov_array,diffs.transpose()),axis=0)    
    denominator = 1/np.sqrt((2*np.pi)**dim*np.linalg.det(cov_array))
    
    l = denominator * np.exp(-0.5*maha_distances)
    
    res_dif = np.array(L) - l
    print res_dif