I have two vectors rev_count
and stars
. The elements of those form pairs (let's say rev_count
is the x coordinate and stars
is the y coordinate).
I would like to bin the data by rev_count
and then average the stars
in a single rev_count bin
(I want to bin along the x axis and compute the average y coordinate in that bin).
This is the code that I tried to use (inspired by my matlab background):
import matplotlib.pyplot as plt
import numpy
binwidth = numpy.max(rev_count)/10
revbin = range(0, numpy.max(rev_count), binwidth)
revbinnedstars = [None]*len(revbin)
for i in range(0, len(revbin)-1):
revbinnedstars[i] = numpy.mean(stars[numpy.argwhere((revbin[i]-binwidth/2) < rev_count < (revbin[i]+binwidth/2))])
print('Plotting binned stars with count')
plt.figure(3)
plt.plot(revbin, revbinnedstars, '.')
plt.show()
However, this seems to be incredibly slow/inefficient. Is there a more natural way to do this in python?
Scipy has a function for this:
from scipy.stats import binned_statistic
revbinnedstars, edges, _ = binned_statistic(rev_count, stars, 'mean', bins=10)
revbin = edges[:-1]
If you don't want to use scipy there's also a histogram function in numpy:
sums, edges = numpy.histogram(rev_count, bins=10, weights=stars)
counts, _ = numpy.histogram(rev_count, bins=10)
revbinnedstars = sums / counts