Search code examples
pythonpandasdataframegroup-byassign

How do I group by Week and Year, and depending on that, assign values to a third column having 'N' as all values initially?


I have a dataframe with 4 columns. I created a new column and assigned all values as 'N' to it.

Assume that first two columns have some random info, Column3 = Year, Column4= week No., Now Column5 =week-ES (initally assigned all 'N')should be equal to 'Week No.' for the recent 5 weeks, and should be equal to 'Pastwk' for all weeks before that. How can I use group by along with the top 5 weeks clause? How do I do that? I used this code, But did not get desire result. the desired result is the table below:

    df.sort_values(['Year','Week No.'],ascending=[False,False],inplace = True)
df['Week-ES'] = 'N'
df = df.groupby(['Year','Week No.']).size()
df['Week-ES'][:5]= df['Week No.'][:5]
#for i in range(5):
#    df.loc[df['Week-ES'].index == i, 'Week-ES'] = df['Week No.'].iloc[i]   
df.iloc[5:]['Week-ES'] = 'Past WK'
Col1 Col2 Year WeekNo. Week-ES
v1 v2 2020 48 Recent
v2 v3 2020 47 Recent
v3 v4 2020 47 Recent
v4 v5 2020 46 Recent
v5 v6 2020 40 Pastwk
v6 v7 2019 52 PastWk

Solution

  • Idea is convert values to week periods, subtract 5 Weeks and compare by weeks periods from datetimes from columns Year and Week by Series.ge for greater or equal and pass to numpy.where:

    last = pd.to_datetime('now').to_period('W') - 5
    print (last)
    2020-11-09/2020-11-15
    
    s = df['Year'].astype(str).add(df['Week'].astype(str).add('-1'))
    dates = pd.to_datetime(s, format='%Y%W-%w').dt.to_period('W')
    df['C'] = np.where(dates.ge(last), 'Recent', 'Pastwk')
    
    print (df)
       Year  Week       C
    0  2020    48  Recent
    1  2020    47  Recent
    2  2020    47  Recent
    3  2020    46  Recent
    4  2020    40  Pastwk
    5  2019    52  Pastwk