Good evening,
I hope that you are well.
I have a list of numbers which I would like to group into "bins" or "buckets" based on ranges that I define. I would like the output to display each group "name" and the total number of values that fall within that range.
For example :-
my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]
Example criteria
I could obviously achieve this in a very crude way by writing multiple conditional if statements but I am aware that there must be a more elegant way of achieving the result.
Various searches have indicate that I could possibly achieve this using pandas.cut / digitize however as of yet, I have been unsuccessful in achieving the required output.
Any assistance would be much appreciated.
Many thanks
James
you are right, you can use pd.cut
combined with a groupby to achieve what you want.
Step 1: Define data
import pandas as pd
import numpy as np
my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]
df = pd.DataFrame(my_list, columns=['data'])
cut_edges = np.array([-np.inf, 100, 1000, 10000, 100000, 250000, np.inf])
labels = ['less than 100', 'between 100 and 1,000', 'between 1,000 and 10,000', 'between 10,000 and 100,000', 'between 100,000 and 250,000', 'greater than 250,000']
Step 2: Generate the category name using pd.cut, and set index for groupby later
df['category'] = pd.cut(df['data'], cut_edges, labels=labels)
df.set_index('category', append=False, inplace=True)
Step 3: groupby to do the count
df.groupby(level='category').count()
Output:
EDIT
As pointed out in the comments numpy.histogram
is another possibly more concise approach which will work. This answer used pd.cut
as it was specifically mentioned in the question.