I have a DataFrame df
that has an Age
column with continuous variables. I would like to create a new DataFrame new_df
, replacing the original continuous variables with categorical variables that I created from binning.
Is there a way to do this?
DataFrame (df
):
Customer_ID Gender Age
0 0002-ORFBO Female 37
1 0003-MKNFE Male 46
2 0004-TLHLJ Male 50
3 0011-IGKFF Male 78
4 0013-EXCHZ Female 75
5 0013-MHZWF Female 23
6 0013-SMEOE Female 67
7 0014-BMAQU Male 52
8 0015-UOCOJ Female 68
9 0016-QLJIS Female 43
10 0017-DINOC Male 47
11 0017-IUDMW Female 25
12 0018-NYROU Female 58
13 0019-EFAEP Female 32
14 0019-GFNTW Female 39
15 0020-INWCK Female 58
16 0020-JDNXP Female 52
17 0021-IKXGC Female 72
18 0022-TCJCI Male 79
My code:
# Ages 0 to 3: Toddler
# Ages 4 to 17: Child
# Ages 18 to 25: Young Adult
# Ages 26 to 64: Adult
# Ages 65 to 99: Elder
pd.cut(df.Age,bins=[0,3,17,25,64,99], labels=['Toddler', 'Child', 'Young Adult', 'Adult', 'Elder'])
If you really want it to be another dataframe, make a copy of the original, and then overwrite the Age
column with what you made:
new_df = df.copy()
new_df['Age'] = pd.cut(new_df['Age'], bins=[0,3,17,25,64,99], labels=['Toddler', 'Child', 'Young Adult', 'Adult', 'Elder'])
print(new_df)
# Output:
Customer_ID Gender Age
0 0002-ORFBO Female Adult
1 0003-MKNFE Male Adult
2 0004-TLHLJ Male Adult
3 0011-IGKFF Male Elder
4 0013-EXCHZ Female Elder
5 0013-MHZWF Female Young Adult
6 0013-SMEOE Female Elder
7 0014-BMAQU Male Adult
8 0015-UOCOJ Female Elder
9 0016-QLJIS Female Adult
10 0017-DINOC Male Adult
11 0017-IUDMW Female Young Adult
12 0018-NYROU Female Adult
13 0019-EFAEP Female Adult
14 0019-GFNTW Female Adult
15 0020-INWCK Female Adult
16 0020-JDNXP Female Adult
17 0021-IKXGC Female Elder
18 0022-TCJCI Male Elder