Search code examples
pandasdatetimepandas-groupbyaggregateweekday

pandas grooup by according to group of days of the week selected


I have this dataframe:

rng = pd.date_range(start='2018-01-01', end='2018-01-21')
rnd_values = np.random.rand(len(rng))+3

df = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})

let's say that I want to group it according to the day of the week and compute the mean:

df['span'] = np.where((df['time'].dt.day_of_week <= 2 , 'Th-Sn', 'Mn-Wd')
df['wkno'] = df['time'].dt.isocalendar().week.shift(fill_value=0) 
df.groupby(['wkno','span']).mean()

However, I would like to make this procedure more general.

Let's say that I define the following day is the week:

days=['Monday','Thursday']

Is there any option that allows me to do what I have done by using "days". I imagine that I have to compute the number of day between 'Monday','Thursday' and then I should use that number. What about the case when

days=['Monday','Thursday','Friday']

I was thinking to set-up a dictionary as:

days={'Monday':0,'Thursday':3,'Friday':4}

then

idays = list(days.values())[:]

How can I use now idays inside np.where? Indeed I have three interval.

Thanks


Solution

  • If you want to use more than one threshold you need np.searchsorted the resulting function would look something like

    def groupby_daysspan_week(dfc,days):
        df = dfc.copy()
        day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
                            'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
        short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
        day_split = [day_to_dayofweek[d] for d in days]
        df['wkno'] = df['time'].dt.isocalendar().week
        df['dow'] = df['time'].dt.day_of_week
        df['span'] = np.searchsorted(day_split,df['dow'],side='right')
        span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]] 
                          for i in range(len(day_split))}
        df_agg = df.groupby(['wkno','span'])['value'].mean()
        df_agg = df_agg.rename(index=span_name_dict,level=1)
        return df_agg