Search code examples
pandasdatetimedatetimeoffset

Pandas - Convert datetime column to start of week date (Sunday)


I've been trying a handful of methods here and can't seem to get it right. What I want to do is look at a datetime stamp column, and create a new column that has the Sunday start of that week.

It MOSTLY works, except if the datetime stamp is on a Sunday, then instead of giving that Sunday as the start of the week, it refers back to the previous Sunday.

Sample dataset:

import pandas as pd

data = [
['0',   '2022-05-22 00:25:13'],
['1',   '2022-05-25 14:59:50'],
['2',   '2022-05-28 17:32:37'],
['3',   '2022-08-15 11:07:52'],
['4',   '2022-04-08 16:52:39']]

columns = ['id', 'event_timestamp_et']

df = pd.DataFrame(data, columns=columns)
df['event_timestamp_et'] = pd.to_datetime(df['event_timestamp_et'])

Expected output:

    id  event_timestamp_et  startWeek
0   0   2022-05-22 00:25:13 2022-05-22
1   1   2022-05-25 14:59:50 2022-05-22
2   2   2022-05-28 17:32:37 2022-05-22
3   3   2022-08-15 11:07:52 2022-08-14
4   4   2022-04-08 16:52:39 2022-04-03

Here is what I have tried (I've included a few methods), as you can see, not quite working:

df['startWeek'] = df['event_timestamp_et'] - pd.offsets.Week(weekday=6)
df['startWeek'] = df['startWeek'].dt.normalize()

df['startWeek1'] = df['event_timestamp_et'] - pd.to_timedelta(df['event_timestamp_et'].dt.dayofweek, unit='d')
df['startWeek2'] = df['event_timestamp_et'].dt.to_period('W-SUN').apply(lambda r: r.start_time)
df["startWeek3"] = df['event_timestamp_et'].dt.to_period('W').dt.start_time + pd.Timedelta(6, unit='d')

Actual output:

    id  event_timestamp_et  startWeek   startWeek1  startWeek2  startWeek3
0   0   2022-05-22 00:25:13 2022-05-15  2022-05-16 00:25:13 2022-05-16  2022-05-22
1   1   2022-05-25 14:59:50 2022-05-22  2022-05-23 14:59:50 2022-05-23  2022-05-29
2   2   2022-05-28 17:32:37 2022-05-22  2022-05-23 17:32:37 2022-05-23  2022-05-29
3   3   2022-08-15 11:07:52 2022-08-14  2022-08-15 11:07:52 2022-08-15  2022-08-21
4   4   2022-04-08 16:52:39 2022-04-03  2022-04-04 16:52:39 2022-04-04  2022-04-10

Solution

  • One way using W-SAT not W-SUN (which is same as W btw)

    df["event_timestamp_et"].dt.to_period("W-SAT").dt.start_time
    

    Output:

    0   2022-05-22
    1   2022-05-22
    2   2022-05-22
    3   2022-08-14
    4   2022-04-03
    Name: event_timestamp_et, dtype: datetime64[ns]
    

    Validation:

    s = pd.Series(["2022-05-21 00:25:13",
                   "2022-05-22 00:25:13",
                   "2022-05-23 00:25:13",
                   "2022-05-24 00:25:13",
                   "2022-05-25 00:25:13",
                   "2022-05-26 00:25:13",
                   "2022-05-27 00:25:13",
                   "2022-05-28 00:25:13",
                   "2022-05-29 00:25:13",
                   "2022-05-30 00:25:13",])
    df = pd.to_datetime(s).to_frame("date")
    df["name"] = df["date"].dt.day_name()
    df["start"] = df["date"].dt.to_period("W-SAT").dt.start_time
    df["start_day"] = df["start"].dt.day_name()
    

    Output:

                     date       name      start start_day
    0 2022-05-21 00:25:13   Saturday 2022-05-15    Sunday
    1 2022-05-22 00:25:13     Sunday 2022-05-22    Sunday
    2 2022-05-23 00:25:13     Monday 2022-05-22    Sunday
    3 2022-05-24 00:25:13    Tuesday 2022-05-22    Sunday
    4 2022-05-25 00:25:13  Wednesday 2022-05-22    Sunday
    5 2022-05-26 00:25:13   Thursday 2022-05-22    Sunday
    6 2022-05-27 00:25:13     Friday 2022-05-22    Sunday
    7 2022-05-28 00:25:13   Saturday 2022-05-22    Sunday
    8 2022-05-29 00:25:13     Sunday 2022-05-29    Sunday
    9 2022-05-30 00:25:13     Monday 2022-05-29    Sunday