I have this kind of DataFrame:
season Date Holiday_Name
12-13 11/1/12 NaN
12-13 11/2/12 Nan
12-13 3/31/13 Easter
12-13 4/5/13 NaN
13-14 11/1/13 NaN.
13-14 4/18/14 Nan.
13-14 4/20/14 Easter.
13-14 4/22/14 Nan.
Etc...
What I need is a new column in which, for each season, there is the difference of days from the Easter.
I've tried with groupby, for loops (even if I know it's wrong), where method, nothing seems to work.
dataset["difference"] = dataset["Date"] -dataset["Date"].where(dataset["holiday_name"]=="Easter").days
but it gives me this error:
'Series' object has no attribute 'days'
or
dataset['differenza_pasqua'] = pd.Index(dataset["Data"] -dataset["Data"].where(dataset["holiday_name"]=="Pasqua di Resurrezione").dropna()).days
with this one I'm able to set as 0 the easter days, but the others are marked as NaN.
What I expect is something like this:
season Date Holiday_Name difference
12-13 11/1/12 NaN 150
12-13 11/2/12 NaN 149.
12-13 3/31/13 Easter 0.
12-13 4/5/13 NaN 5.
13-14 11/1/13 NaN 150.
13-14 4/18/14 Nan 2.
13-14 4/20/14 Easter 0.
13-14 4/22/14 Nan 2.
Thanks for your help.
It's easily solved using to use groupby.
ddf = df.groupby('season').apply(lambda x : x['Date'] - x.loc[x['Holiday_Name'] == 'Easter']['Date'].iloc[0]).reset_index()
df['difference'] = ddf['Date']
season Date Holiday_Name difference
0 12-13 2012-11-01 NaN -150 days
1 12-13 2012-11-02 Nan -149 days
2 12-13 2013-03-31 Easter 0 days
3 12-13 2013-04-05 NaN 5 days
4 13-14 2013-11-01 NaN -170 days
5 13-14 2014-04-18 Nan -2 days
6 13-14 2014-04-20 Easter 0 days
7 13-14 2014-04-22 Nan 2 days
Note: you need to remove the dots from your data in "Nan. Easter."