Search code examples
pythonpandasdifference

Add column with difference between dates pandas DataFrame


I have this kind of DataFrame:

season       Date          Holiday_Name  
12-13        11/1/12          NaN        
12-13        11/2/12          Nan        
12-13        3/31/13         Easter        
12-13         4/5/13           NaN           

13-14        11/1/13          NaN.  
13-14        4/18/14          Nan.   
13-14        4/20/14         Easter.  
13-14        4/22/14          Nan.   

Etc...

What I need is a new column in which, for each season, there is the difference of days from the Easter.

I've tried with groupby, for loops (even if I know it's wrong), where method, nothing seems to work.

dataset["difference"] = dataset["Date"] -dataset["Date"].where(dataset["holiday_name"]=="Easter").days

but it gives me this error:

'Series' object has no attribute 'days'

or

dataset['differenza_pasqua'] = pd.Index(dataset["Data"] -dataset["Data"].where(dataset["holiday_name"]=="Pasqua di Resurrezione").dropna()).days

with this one I'm able to set as 0 the easter days, but the others are marked as NaN.

What I expect is something like this:

season       Date          Holiday_Name      difference  
12-13        11/1/12          NaN               150    
12-13        11/2/12          NaN               149.  
12-13        3/31/13         Easter              0.  
12-13        4/5/13           NaN                5.  

13-14        11/1/13          NaN               150.  
13-14        4/18/14          Nan                 2.  
13-14        4/20/14         Easter               0.   
13-14        4/22/14          Nan                 2.   

Thanks for your help.


Solution

  • It's easily solved using to use groupby.

    ddf = df.groupby('season').apply(lambda x : x['Date'] - x.loc[x['Holiday_Name'] == 'Easter']['Date'].iloc[0]).reset_index()
    df['difference'] = ddf['Date']
    
      season       Date Holiday_Name difference
    0  12-13 2012-11-01          NaN  -150 days
    1  12-13 2012-11-02          Nan  -149 days
    2  12-13 2013-03-31       Easter     0 days
    3  12-13 2013-04-05          NaN     5 days
    4  13-14 2013-11-01          NaN  -170 days
    5  13-14 2014-04-18          Nan    -2 days
    6  13-14 2014-04-20       Easter     0 days
    7  13-14 2014-04-22          Nan     2 days
    

    Note: you need to remove the dots from your data in "Nan. Easter."