Search code examples
pythonpandasdatetimetime-series

Resample('W') weird results


I have a pandas DataFrame containing daily dates, and within this DataFrame, some dates are missing. I aim to generate a new time series that includes only the last day of each week from that DataFrame. For instance, if there are only Wednesday and Thursday entries for a specific week, the resulting time series should retain only the Thursday data point for that week.

For example I tried the following:

import pandas as pd
import numpy as np

# Create a sample time series with date index
#SUNDAY, MONDAY, TUESDAY, WED, THURSDAY
date_list = ['2023-10-01', '2023-10-02', '2023-10-03', '2023-10-04', '2023-10-05']

# Convert the date list to a pandas datetime index
date_rng = pd.to_datetime(date_list)

data = np.random.rand(len(date_rng))
time_series = pd.Series(data, index=date_rng)


# Resample the time series to weekly frequency and select the last observation for each week
weekly_last = time_series.resample('W').last()

weekly_last['Day of the Week'] = weekly_last.index.day_name()

# Print the result
print(weekly_last)

which prints: ['SUNDAY', 'SUNDAY'] whereas it should print ['SUNDAY', 'THURSDAY']

So I don't really know how to achieve what I want? Thank you very much for your help


Solution

  • You can use the isocalendar() method to get the week of the year, and then group on that:

    df = pd.DataFrame({'date': date_rng, 'value': data})
    df['week'] = df['date'].dt.isocalendar().week
    grouped = df.groupby('week').last().set_index('date')
    weekly_last = grouped['value'].copy()
    

    Then you get the expected result:

    >>> weekly_last.index.day_name()
    Index(['Sunday', 'Thursday'], dtype='object', name='date')
    

    When you resample using 'W', you are grouping the data to every week, as you want. However, the assigned label on the index will be the corresponding Sunday. You can pick a different day anchor, but this doesn't help you. So you basically need to do the same grouping by keep your original dates as labels.

    To do this, you can group on the numbered week of the year, take the last observation each week, and use that date for the index. This requires a few extra steps beyond one call to resample/groupby, because.

    A couple notes:

    • If you directly group on the week of the year (time_series.groupby(time_series.index.isocalendar().week).last(), you lose the dates and will only see the week of the year in the index. That's why I collect things in a dataframe - then the dates are preserved in their own column, and you can use set_index() to make them the index.
    • You currently can also use the weekofyear attribute to get the week of the year. But I get a deprecation warning when using this, and an instruction to use isocalendar().