Search code examples
pythonnumpymatplotlibbinning

Python: Binning one coordinate and averaging another based on these bins


I have two vectors rev_count and stars. The elements of those form pairs (let's say rev_count is the x coordinate and stars is the y coordinate).

I would like to bin the data by rev_count and then average the stars in a single rev_count bin (I want to bin along the x axis and compute the average y coordinate in that bin).

This is the code that I tried to use (inspired by my matlab background):

import matplotlib.pyplot as plt
import numpy

binwidth = numpy.max(rev_count)/10
revbin = range(0, numpy.max(rev_count), binwidth)
revbinnedstars = [None]*len(revbin)

for i in range(0, len(revbin)-1):
    revbinnedstars[i] = numpy.mean(stars[numpy.argwhere((revbin[i]-binwidth/2) < rev_count < (revbin[i]+binwidth/2))])

print('Plotting binned stars with count')
plt.figure(3)
plt.plot(revbin, revbinnedstars, '.')
plt.show()

However, this seems to be incredibly slow/inefficient. Is there a more natural way to do this in python?


Solution

  • Scipy has a function for this:

    from scipy.stats import binned_statistic
    
    revbinnedstars, edges, _ = binned_statistic(rev_count, stars, 'mean', bins=10)
    revbin = edges[:-1]
    

    If you don't want to use scipy there's also a histogram function in numpy:

    sums, edges = numpy.histogram(rev_count, bins=10, weights=stars)
    counts, _ = numpy.histogram(rev_count, bins=10)
    revbinnedstars = sums / counts