Search code examples
pythonrpandasdataframecategorical-data

Convert numerical data to categorical in Python


I have a pandas dataframe with the column fert_Rate for fertility rate. I want to have a new column with these values as categorical instead of numerical. Instead of 1.0, 2.5, 4.0 I want something like (low, medium, high). In R I would have written something like this:

attach(mydata)
mydata$fertcat[fert_Rate > 3.5] <- "High"
mydata$fertcat[fert_Rate > 2 & fert_Rate <= 3.5] <- "Medium"
mydata$fertcat[fert_Rate <= 2] <- "Low"
detach(mydata)

Is there a similar way to do it in python or should I just loop over the column to create?


Solution

  • Use pd.cut to bin your data.

    df = pd.DataFrame({'fert_Rate': [1, 2, 3, 3.5, 4, 5]})
    >>> df.assign(fertility=pd.cut(df['fert_Rate'], 
                                   bins=[0, 2, 3.5, 999], 
                                   labels=['Low', 'Medium', 'High']))
       fert_Rate fertility
    0        1.0       Low
    1        2.0       Low
    2        3.0    Medium
    3        3.5    Medium
    4        4.0      High
    5        5.0      High