Search code examples
pandasdataframegroup-by

Get counts for subgroups


I found the groupby command, which gets me the counts based on bins of a category. I would like help on getting counts based on two different sets of bins in two different categories.

For example,

data = [
    {'a': 11, 'b': 1},
    {'a': 22, 'b': 1},
    {'a': 25, 'b': 5},
    {'a': 11, 'b': 1},
    {'a': 22, 'b': 2},
    {'a': 15, 'b': 6},
]
df.groupby(pd.cut(df["a"], [10,20,30])).count()

From the above, I can count how many objects are from 10 to 20, and from 20 to 30 for item a.

But, I am not sure how to bin the output from a in another bin. For example, I am interested in how many times I have "a" from 10 to 20, and then "b" from 1 to 5, and then "b" from 5 to 10. Likewise, how many times I have "a" from 20 to 30 and then "b" from 1 to 5, and then "b" from 5 to 10.

I could perhaps make separate dataframes for each category of "a" and then use groupby to count objects in bins of b. Is there any easier way to use the groupby category with bins for both without making these separate dataframes? Thank you.


Solution

  • You can provide multiple group keys:

    df.groupby(
        [
            pd.cut(df["a"], [10, 20, 30]),
            pd.cut(df['b'], [1, 5, 10])],
        observed=False  # Added this because the default will change in the future
    ).count()
    
                      a  b
    a        b
    (10, 20] (1, 5]   0  0
             (5, 10]  1  1
    (20, 30] (1, 5]   2  2
             (5, 10]  0  0
    

    For more information, see the user guide: Group by § Splitting an object into groups