I have a pandas dataframe with the column fert_Rate for fertility rate. I want to have a new column with these values as categorical instead of numerical. Instead of 1.0, 2.5, 4.0 I want something like (low, medium, high). In R I would have written something like this:
attach(mydata)
mydata$fertcat[fert_Rate > 3.5] <- "High"
mydata$fertcat[fert_Rate > 2 & fert_Rate <= 3.5] <- "Medium"
mydata$fertcat[fert_Rate <= 2] <- "Low"
detach(mydata)
Is there a similar way to do it in python or should I just loop over the column to create?
Use pd.cut
to bin your data.
df = pd.DataFrame({'fert_Rate': [1, 2, 3, 3.5, 4, 5]})
>>> df.assign(fertility=pd.cut(df['fert_Rate'],
bins=[0, 2, 3.5, 999],
labels=['Low', 'Medium', 'High']))
fert_Rate fertility
0 1.0 Low
1 2.0 Low
2 3.0 Medium
3 3.5 Medium
4 4.0 High
5 5.0 High