Search code examples
pythonpandasintervals

How can I replace pd intervals with integers in python


How can I replace pd intervals with integers

import pandas as pd 
df = pd.DataFrame()
df['age'] = [43, 76, 27, 8, 57, 32, 12, 22]
age_band = [0,10,20,30,40,50,60,70,80,90]
df['age_bands']= pd.cut(df['age'], bins=age_band, ordered=True)

output:

    age age_bands
0   43  (40, 50]
1   76  (70, 80]
2   27  (20, 30]
3   8   (0, 10]
4   57  (50, 60]
5   32  (30, 40]
6   12  (10, 20]
7   22  (20, 30]

now I want to add another column to replace the bands with a single number (int). but I could not

for example this did not work :

df['age_code']= df['age_bands'].replace({'(40, 50]':4})

how can I get a column looks like this?

    age_bands   age_code
0   (40, 50]      4
1   (70, 80]      7
2   (20, 30]      2
3   (0, 10]       0
4   (50, 60]      5
5   (30, 40]      3
6   (10, 20]      1
7   (20, 30]      2

Solution

  • Assuming you want to the first digit from every interval, then, you can use pd.apply to achieve what you want as follows:

    df["age_code"] = df["age_bands"].apply(lambda band: str(band)[1])
    

    However, note this may not be very efficient for a large dataframe,

    To convert the column values to int datatype, you can use pd.to_numeric,

    df["age_code"] = pd.to_numeric(df['age_code'])