I have an array x
with data like this: [3.1, 3.0, 3.3, 3.5, 3.8, 3.75, 4.0] etc.
I have another variable y
with corresponding 0s and 1s [0, 1, 0]
I want to get from that new separate arrays to have that divided
freq, bins = np.histogram(X, 5)
That allows me to know the cutoffs for each bin. But how do I actually get that data? For example, if I have two bins (3 to 3.5 and 3.5 to 4), I want two get two arrays in return like this [3.1, 3.2, 3.4, ...] and [3.6, 3.7, 4, ...]. Also, I want the variable y
to be broken and sorted in the same fashion.
Summary: I am looking for code to break x
into bins with corresponding y
values.
I thought about doing something using the bins
variable, but I am not sure how to split the data based on the cutoffs. I appreciate any help.
If I graph a normal histogram of X, I get this:
Using code:
d=plt.hist(X, 5, facecolor='blue', alpha=0.5)
Working Code:
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def getLists(a, b, bin_obj):
index_list = []
for left, right in pairwise(bin_obj):
indices = np.where((a >= left) & (a < right))
index_list += [indices[0]]
X_ret = [a[i] for i in index_list]
Y_ret = [b[i] for i in index_list]
return (X_ret, Y_ret)
freq, bins = np.histogram(X[:, 0], 5)
Xnew, Ynew = getLists(X[:, 0], Y, bins)
There's a handful python function defined in the standard library.
from itertools import tee
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
It can help you to iterate through your bins and get the indices of your elements.
for left, right in pairwise(bins):
indices = np.where((x >= left) & (x < right))
print(x[indices], y[indices])