Search code examples
pythonnumpystatisticsscipycovariance

Python package that supports weighted covariance computation


Is there a python statistical package that supports the computation of weighted covariance (i.e., each observation has a weight) ? Unfortuantely numpy.cov does not support weights.

Preferably working under numpy/scipy framework (i.e., able to use numpy arrays to speed up the computation).

Thanks a lot!


Solution

  • statsmodels has weighted covariance calculation in stats.

    But we can still calculate it also directly:

    # -*- coding: utf-8 -*-
    """descriptive statistic with case weights
    
    Author: Josef Perktold
    """
    
    import numpy as np
    from statsmodels.stats.weightstats import DescrStatsW
    
    
    np.random.seed(987467)
    x = np.random.multivariate_normal([0, 1.], [[1., 0.5], [0.5, 1]], size=20)
    weights = np.random.randint(1, 4, size=20)
    
    xlong = np.repeat(x, weights, axis=0)
    
    ds = DescrStatsW(x, weights=weights)
    
    print 'cov statsmodels'
    print ds.cov
    
    self = ds  #alias to use copied expression
    ds_cov = np.dot(self.weights * self.demeaned.T, self.demeaned) / self.sum_weights
    
    print '\nddof=0'
    print ds_cov
    print np.cov(xlong.T, bias=1)
    
    # calculating it directly
    ds_cov0 = np.dot(self.weights * self.demeaned.T, self.demeaned) / \
                  (self.sum_weights - 1)
    print '\nddof=1'
    print ds_cov0
    print np.cov(xlong.T, bias=0)
    

    This prints:

    cov  statsmodels
    [[ 0.43671986  0.06551506]
     [ 0.06551506  0.66281218]]
    
    ddof=0
    [[ 0.43671986  0.06551506]
     [ 0.06551506  0.66281218]]
    [[ 0.43671986  0.06551506]
     [ 0.06551506  0.66281218]]
    
    ddof=1
    [[ 0.44821249  0.06723914]
     [ 0.06723914  0.68025461]]
    [[ 0.44821249  0.06723914]
     [ 0.06723914  0.68025461]]
    

    editorial note

    The initial answer pointed out a bug in statsmodels that has been fixed in the meantime.