Search code examples
pythonnumpystatisticsmeanweighted

Weighted mean in numpy/python


I have a big continuous array of values that ranges from (-100, 100)

Now for this array I want to calculate the weighted average described here

since it's continuous I want also to set breaks for the values every 20 i.e the values should be discrete as -100 -80 -60 .... 60 80 100

How can I do this in NumPy or python in general?

EDIT: the difference here from the normal mean, that the mean is calculated according to the frequency of values


Solution

  • You actually have 2 different questions.

    1. How to make data discrete, and
    2. How to make a weighted average.

    It's usually better to ask 1 question at a time, but anyway.

    Given your specification:

    xmin = -100
    xmax = 100
    binsize = 20
    

    First, let's import numpy and make some data:

    import numpy as np
    data = numpy.array(range(xmin, xmax))
    

    Then let's make the binnings you are looking for:

    bins_arange = numpy.arange(xmin, xmax + 1, binsize) 
    

    From this we can convert the data to the discrete form:

    counts, edges = numpy.histogram(data, bins=bins_arange)
    

    Now to calculate the weighted average, we can use the binning middle (e.g. numbers between -100 and -80 will be on average -90):

    bin_middles = (edges[:-1] + edges[1:]) / 2
    

    Note that this method does not require the binnings to be evenly "spaced", contrary to the integer division method.

    Then let's make some weights:

    weights = numpy.array(range(len(counts)) / sum(range(len(counts))
    

    Then to bring it all together:

    average =          np.sum(bin_middles * counts * 1) / sum(counts)
    weighted_average = np.sum(bin_middles * counts * weights) / sum(counts)