Search code examples
numpyscipyskewkurtosiscode-statistics

Calculate weighted statistical moments in Python


I've been looking for a function or package that would allow me to calculate the skew and kurtosis of a distribution in a weighted way, as I have histogram data.

For instance I have the data

import numpy as np

np.array([[1, 2],
          [2, 5],
          [3, 6],
          [4,12],
          [5, 1])

where the first column [1,2,3,4,5] are the values and the second column [2,5,6,12,1] are the frequencies of the values.

I have found out how to do the first two moments (mean, standard deviation) in a weighted way using the weighted_avg_and_std function specified in this thread, but I was not quite sure how I could extend this to both the skew and kurtosis, or even the nth statistical moment.

I have found the definitions themselves here and could manually write functions to implement this from scratch, but before I go and do that I was wondering if there were any existing packages or functions that might be able to do this.

Thanks

EDIT: I figured it out, the following code works (please note that this is for population moments)

skewnewss = np.average(((values-average)/np.sqrt(variance))**3, weights=weights)

and

kurtosis=np.average(((values-average)/np.sqrt(variance))**4-3, weights=weights)

Solution

  • I think you have already listed all the ingredients that you need, following the formulas in the link you provided:

    import numpy as np
    
    a = np.array([[1,2],[2,5],[3,6],[4,12],[5,1]])
    values, weights = a.T
    
    def n_weighted_moment(values, weights, n):
    
        assert n>0 & (values.shape == weights.shape)
        w_avg = np.average(values, weights = weights)
        w_var = np.sum(weights * (values - w_avg)**2)/np.sum(weights)
    
        if n==1:
            return w_avg
        elif n==2:
            return w_var
        else:
            w_std = np.sqrt(w_var)
            return np.sum(weights * ((values - w_avg)/w_std)**n)/np.sum(weights)
                  #Same as np.average(((values - w_avg)/w_std)**n, weights=weights)
    
    

    Which results in:

    for n in range(1,5):
        print(f'Moment {n} value is {n_weighted_moment(values, weights, n)}')
    
    Moment 1 value is 3.1923076923076925
    Moment 2 value is 1.0784023668639053
    Moment 3 value is -0.5962505715592139
    Moment 4 value is 2.384432138280637
    

    Notice that while you are calculating the excess kurtosis, the formula implemented for a generic n-moment doesn't account for that.