Search code examples
pythonpandasbinning

Binning in Pandas


Given the following dataframe in Pandas:

"Age","Gender","Impressions","Clicks","Signed_In"
36,0,3,0,1
73,1,3,0,1
30,0,3,0,1
49,1,3,0,1
47,1,11,0,1

I need to make a separate categorical variable (column) which holds the bin label for each row based on age. For instance, against the row -

36,0,3,0,1

I want another column to show 'Between 35 and 45'.

The final record should appear as -

36,0,3,0,1,'Between 35 and 45'

Solution

  • You should create a sample set of data to help people answer your questions:

    import pandas as pd
    import numpy as np
    d  = {'Age' : [36, 73, 30, 49, 47],
      'Gender' : [0, 1, 0, 1, 1],
      'Impressions' : [3, 3, 3, 3, 11],
      'Clicks' : [0, 0, 0, 0, 0],
      'Signed_In' : [1, 1, 1, 1, 1]}
    df = pd.DataFrame(d)
    

    Makes it so people can just copy and paste easily instead of having to manually create your problem.

    numpy's round function will round a negative decimal place:

    df['Age_rounded'] = np.round(df['Age'], -1)
    
        Age Clicks  Gender  Impressions Signed_In   Age_rounded
    0   36  0       0       3           1           40
    1   73  0       1       3           1           70
    2   30  0       0       3           1           30
    3   49  0       1       3           1           50
    4   47  0       1       11          1           50
    

    You can then map a dictionary onto those values:

     categories_dict = {30 : 'Between 25 and 35',
                        40 : 'Between 35 and 45',
                        50 : 'Between 45 and 55',
                        70 : 'Between 65 and 75'}
    
     df['category'] = df['Age_rounded'].map(categories_dict)
    
        Age Clicks  Gender  Impressions Signed_In   Age_rounded category
    0   36  0       0       3           1           40          Between 35 and 45
    1   73  0       1       3           1           70          Between 65 and 75
    2   30  0       0       3           1           30          Between 25 and 35
    3   49  0       1       3           1           50          Between 45 and 55
    4   47  0       1       11          1           50          Between 45 and 55