How can I replace pd intervals with integers
import pandas as pd
df = pd.DataFrame()
df['age'] = [43, 76, 27, 8, 57, 32, 12, 22]
age_band = [0,10,20,30,40,50,60,70,80,90]
df['age_bands']= pd.cut(df['age'], bins=age_band, ordered=True)
output:
age age_bands
0 43 (40, 50]
1 76 (70, 80]
2 27 (20, 30]
3 8 (0, 10]
4 57 (50, 60]
5 32 (30, 40]
6 12 (10, 20]
7 22 (20, 30]
now I want to add another column to replace the bands with a single number (int). but I could not
for example this did not work :
df['age_code']= df['age_bands'].replace({'(40, 50]':4})
how can I get a column looks like this?
age_bands age_code
0 (40, 50] 4
1 (70, 80] 7
2 (20, 30] 2
3 (0, 10] 0
4 (50, 60] 5
5 (30, 40] 3
6 (10, 20] 1
7 (20, 30] 2
Assuming you want to the first digit from every interval, then, you can use pd.apply
to achieve what you want as follows:
df["age_code"] = df["age_bands"].apply(lambda band: str(band)[1])
However, note this may not be very efficient for a large dataframe,
To convert the column values to int datatype, you can use pd.to_numeric
,
df["age_code"] = pd.to_numeric(df['age_code'])