Search code examples
pythonpandasdataframebinning

Converting columns from float datatype to categorical datatype using binning


I wish to convert a data frame consisting of two columns.

Here is the sample df:

Output:

df:

    cost      numbers   
1    360        23

2    120        35

3    2000       49

Both columns are float and I wish to convert them to categorical using binning. I wish to create the following bins for each column when converting to categorical.

Bins for the numbers : 18-24, 25-44, 45-65, 66-92

Bins for cost column: >=1000, <1000

Finally, I want to not create a new column but just convert the column without creating a new one. Here is my attempted code at this:

def PreprocessDataframe(df):
 
     #use binning to convert age and budget to categorical columns
    df['numbers'] = pd.cut(df['numbers'], bins=[18, 24, 25, 44, 45, 65, 66, 92])
    df['cost'] = pd.cut(df['cost'], bins=['=>1000', '<1000'])
    
    return df

I understand how to convert the "numbers" column but I am having trouble with the "cost" one. Help would be nice on how to solve this. Thanks in advance! Cheers!


Solution

  • If you use bins=[18, 24, 25, 44, 45, 65, 66, 92], this is going to generate bins for 18-24, 24-25, 25-44, 44-45, etc... and you don't need the ones for 24-25, 44-45...

    By default, the bins are from the first value (not incusive) to the last value inclusive.

    So, for numbers, you could use instead bins=[17, 24, 44, 65, 92] (note the 17 at the first position, so 18 is included).

    The optional parameter label allows to choose labels for the bins.

    df['numbers'] = pd.cut(df['numbers'], bins=[17, 24, 44, 65, 92], labels=['18-24', '25-44', '45-65', '66-92'])
    df['cost'] = pd.cut(df['cost'], bins=[0, 999.99, df['cost'].max()], labels=['<1000', '=>1000'])
    
    print(df)
    
    >>> df
         cost numbers
    0   <1000   18-24
    1   <1000   25-44
    2  =>1000   45-65