Search code examples
pythonpandasdataframebinning

Pandas binning column values according to the index


Currently, I have a DataFrame which holds the ages of a population of people and the frequency of those ages, like this:

      freq
 27    103
 28     43
 29     13
...    ...
 78     20
 79     13

The age is the index of the DataFrame. I would like to perform some Pandas magic so that I get a binned DataFrame like this:

           freq
 (20, 30]   308
 (30, 40]   111
 (40, 50]    85
 (50, 60]    58
 (60, 70]    63
 (70, 80]   101

So the index is now made up of intervals of ages, rather than individual ages and the frequencies are summed accordingly. How can I accomplish this?


Solution

  • You can use groupby after using cut to bin the index of the DataFrame. For example:

    >>> df = pd.DataFrame({'freq': [2, 3, 5, 7, 11, 13]}, 
                          index=[22, 29, 30, 31,25, 42])
    
    >>> df
        freq
    22     2
    29     3
    30     5
    31     7
    25    11
    42    13
    

    Then:

    >>> df.groupby(pd.cut(df.index, np.arange(20, 60, 10))).sum()
              freq
    (20, 30]    21
    (30, 40]     7
    (40, 50]    13
    

    np.arange(20, 60, 10) defines the bins that will be used; you can adjust these according to max/min values in your 'freq' column.