Search code examples
pythonfunctiondataframedata-sciencedata-analysis

Apply a function in a dataframe's columns [Python]


I just wrote this function to calculated the age's person based in two columns in a Python DataFrame. Unfortunately, if a use the return the function return the same value for all rows, but if I use the print statement the function gives me the right values.

Here is the code:

def calc_age(dataset):
    index = dataset.index
    for element in index:
        year_nasc = train['DT_NASCIMENTO_BENEFICIARIO'][element][6:]
        year_insc = train['ANO_CONCESSAO_BOLSA'][element]
        age = int(year_insc) - int(year_nasc)
        print ('Age: ', age)
        #return age

train['DT_NASCIMENTO_BENEFICIARIO'] = 03-02-1987

train['ANO_CONCESSAO_BOLSA'] = 2009

What am I doing wrong?!


Solution

  • If what you want is to subtract the year of DT_NASCIMENTO_BENEFICIARIO from ANO_CONCESSAO_BOLSA, and df is your DataFrame:

    # cast to datetime
    df["DT_NASCIMENTO_BENEFICIARIO"] = pd.to_datetime(df["DT_NASCIMENTO_BENEFICIARIO"])
    df["age"] = df["ANO_CONCESSAO_BOLSA"] - df["DT_NASCIMENTO_BENEFICIARIO"].dt.year
    
    # print the result, or do something else with it:
    print(df["age"])