I have a dataframe with 4 columns. I created a new column and assigned all values as 'N' to it.
Assume that first two columns have some random info, Column3 = Year, Column4= week No., Now Column5 =week-ES (initally assigned all 'N')should be equal to 'Week No.' for the recent 5 weeks, and should be equal to 'Pastwk' for all weeks before that. How can I use group by along with the top 5 weeks clause? How do I do that? I used this code, But did not get desire result. the desired result is the table below:
df.sort_values(['Year','Week No.'],ascending=[False,False],inplace = True)
df['Week-ES'] = 'N'
df = df.groupby(['Year','Week No.']).size()
df['Week-ES'][:5]= df['Week No.'][:5]
#for i in range(5):
# df.loc[df['Week-ES'].index == i, 'Week-ES'] = df['Week No.'].iloc[i]
df.iloc[5:]['Week-ES'] = 'Past WK'
Col1 | Col2 | Year | WeekNo. | Week-ES |
---|---|---|---|---|
v1 | v2 | 2020 | 48 | Recent |
v2 | v3 | 2020 | 47 | Recent |
v3 | v4 | 2020 | 47 | Recent |
v4 | v5 | 2020 | 46 | Recent |
v5 | v6 | 2020 | 40 | Pastwk |
v6 | v7 | 2019 | 52 | PastWk |
Idea is convert values to week periods, subtract 5 Weeks and compare by weeks periods from datetimes from columns Year
and Week
by Series.ge
for greater or equal and pass to numpy.where
:
last = pd.to_datetime('now').to_period('W') - 5
print (last)
2020-11-09/2020-11-15
s = df['Year'].astype(str).add(df['Week'].astype(str).add('-1'))
dates = pd.to_datetime(s, format='%Y%W-%w').dt.to_period('W')
df['C'] = np.where(dates.ge(last), 'Recent', 'Pastwk')
print (df)
Year Week C
0 2020 48 Recent
1 2020 47 Recent
2 2020 47 Recent
3 2020 46 Recent
4 2020 40 Pastwk
5 2019 52 Pastwk