Search code examples
pythonfunctionpandastime-seriesdefined

Python Pandas: Using a user defined function to fill in a blank variable


I am trying to figure out a way to fill in a blank column using a user defined function. I have a Start Date column and an End Date column. The End Date is currently blank. The data has been read in as a csv into a pandas data-frame called df.

What I am wanting to do specifically is build a user defined function that takes the date in the Start Date column and adds 1 year to it and puts that into the end date column. Something to the effect of this:

Beginning Data-frame:

Start_Date        End_Date
 12/4/2013        NaN
 07/16/2012       NaN
 03/05/1999       NaN

Output with one year added:

Start_Date        End_Date
 12/04/2013       12/03/2014
 07/16/2012       07/15/2013
 03/05/1999       03/04/2000

I realize this can be done with the following code:

from datetime import timedelta
df['END_DATE'] = df['START_DATE'] + timedelta(days=365)

But I would really like to use a user defined function (if it is possible) along the lines of:

def add_1_year(x):
   ed = [x['START_DATE']+ timedelta(days=365)
   return pd.Series(ed)

 df['END_DATE'].apply(add_1_year)
 df[['START_DATE','END_DATE']]

I hope this makes as much sense, but any suggestions will be greatly appreciated.

Thanks


Solution

  • Assuming 'Start_Date' is already a datetime:

    def add_1_year(x):
        x['End_Date'] = x['Start_Date']+ timedelta(days=365)
        return x
    
    df.apply(add_1_year,axis=1)
    

    Should do it