Python - Group / count a list of numbers by custom ranges

Good evening,

I hope that you are well.

I have a list of numbers which I would like to group into "bins" or "buckets" based on ranges that I define. I would like the output to display each group "name" and the total number of values that fall within that range.

For example :-

my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]

Example criteria

greater than 250,000 (output: 1)
greater than 100,000 but less than or equal to 250,000 (output: 1)
greater than 10,000 but less than or equal to 100,000 (output: 0)
greater than 1,000 but less than or equal to 10,000 (output: 1)
greater than 100 but less than or equal to 1000 (output: 2)
less than 100 (output: 3)

I could obviously achieve this in a very crude way by writing multiple conditional if statements but I am aware that there must be a more elegant way of achieving the result.

Various searches have indicate that I could possibly achieve this using pandas.cut / digitize however as of yet, I have been unsuccessful in achieving the required output.

Any assistance would be much appreciated.

Many thanks

James

Solution

you are right, you can use pd.cut combined with a groupby to achieve what you want.

Step 1: Define data

import pandas as pd
import numpy as np

my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]
df = pd.DataFrame(my_list, columns=['data'])
cut_edges = np.array([-np.inf, 100, 1000, 10000, 100000, 250000, np.inf])
labels = ['less than 100', 'between 100 and 1,000', 'between 1,000 and 10,000', 'between 10,000 and 100,000', 'between 100,000 and 250,000', 'greater than 250,000']

Step 2: Generate the category name using pd.cut, and set index for groupby later

df['category'] = pd.cut(df['data'], cut_edges, labels=labels)
df.set_index('category', append=False, inplace=True)

Step 3: groupby to do the count

df.groupby(level='category').count()

Output:

EDIT

As pointed out in the comments numpy.histogram is another possibly more concise approach which will work. This answer used pd.cut as it was specifically mentioned in the question.