Search code examples
pythonpandaspandas-groupbydayofweekweek-number

Pandas: Select highest day of the week, excluding weekends, unless one record


I've got a dataframe with dates, and I want to select the highest date in each week excluding weekends (so Fridays, if available), unless there is no Monday to Friday data and Saturday/Sunday are the only one's available.

The sample data can be setup like this:

dates = pd.Series(data=['2018-11-05', '2018-11-06', '2018-11-07', '2018-11-08', '2018-11-09',
                        '2018-11-12', '2018-11-13', '2018-11-14', '2018-11-15', '2018-11-17',
                        '2018-11-19',
                        '2018-12-01',
                        ])
nums = np.random.randint(50, 100, 12)
# nums
# array([95, 80, 81, 51, 98, 62, 50, 55, 59, 77, 69])

df = pd.DataFrame(data={'dates': dates, 'nums': nums})
df['dates'] = pd.to_datetime(df['dates'])

The records I want:

  • 2018-11-09 is Friday
  • 2018-11-15 is Thursday (not 2018-11-17 coz it's Saturday)
  • 2018-11-19 is Monday and the only record for that week
  • 2018-12-01 is Saturday but the only record for that week

My current solution is in the answer below but I don't think it's ideal and has some issues I had to work around. Briefly, it's:

  1. groupby week: df.groupby(df['dates'].dt.week).apply(some_function)
  2. if there's just one record for that week, return it
  3. otherwise, select the highest/latest record with day <= Friday and return that

Ideally, I'd like a way to write:

[latest Mon-Fri record] if [has Mon-Fri record] else [latest Sat-Sun record]

Solution

  • Create a new hierarchy of weekdays, where Saturday and Sunday are given the lowest priority. Then sort_values on this new ranking + groupby + .tail(1).

    import numpy as np
    
    wd_map = dict(zip(np.arange(0,7,1), np.roll(np.arange(0,7,1),-2)))
    # {0: 2, 1: 3, 2: 4, 3: 5, 4: 6, 5: 0, 6: 1}
    df = df.assign(day_mapped = df.dates.dt.weekday.map(wd_map)).sort_values('day_mapped')
    
    df.groupby(df.dates.dt.week).tail(1).sort_index()
    

    Output

            dates  nums  day_mapped
    4  2018-11-09    57           6
    8  2018-11-15    83           5
    10 2018-11-19    96           2
    11 2018-12-01    66           0
    

    If your data span multiple years, you'll need to group on both Year + week.