Search code examples
pythonpandaspandas-resample

How do I make Pandas resample align on the day for timeframes that don't divide into 24 hours?


The resample function doesn't work with 5h or 10h intervals. It starts okay and then it stops aligning on the day.

d = {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}
df = read.resample('5H').agg(d)
df

output:

                    Open    High    Low     Close
Date
2024-01-21 00:00:00 317.3   319.5   316.1   317.4
2024-01-21 05:00:00 317.4   319.4   317.2   319.0
2024-01-21 10:00:00 319.0   319.6   318.0   319.1
2024-01-21 15:00:00 319.1   321.6   317.5   320.6
2024-01-21 20:00:00 320.6   321.7   318.2   319.5
2024-01-22 01:00:00 319.5   320.1   314.1   317.2
2024-01-22 06:00:00 317.2   318.6   310.7   312.8
2024-01-22 11:00:00 312.8   315.3   308.4   311.1
2024-01-22 16:00:00 311.2   312.5   303.6   308.5
2024-01-22 21:00:00 308.5   308.7   304.8   308.2
2024-01-23 02:00:00 308.1   311.7   308.1   309.9
2024-01-23 07:00:00 310.0   310.2   291.7   295.9

It starts okay until the next day 2024-01-22.

I want to resample data like this:

  • 00:00 to 05:00 (this is 5 hours)
  • 05:00 to 10:00 (5 hours)
  • 10:00 to 15:00 (5 hours)
  • 15:00 to 20:00 (5 hours)
  • 20:00 to 00:00 next day (4 hours)

It works correctly with 2h, 3h, 4h, or 6h intervals

d = {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}
df = read.resample('4H').agg(d)
df

output:

                    Open    High    Low     Close
Date
2024-01-21 00:00:00 317.3   319.5   316.1   318.5
2024-01-21 04:00:00 318.5   318.9   317.2   317.5
2024-01-21 08:00:00 317.5   319.6   317.3   319.3
2024-01-21 12:00:00 319.3   319.6   317.5   318.9
2024-01-21 16:00:00 318.9   321.6   317.9   320.6
2024-01-21 20:00:00 320.6   321.7   318.2   318.6
2024-01-22 00:00:00 318.7   320.3   315.3   316.7
2024-01-22 04:00:00 316.7   318.6   314.1   317.8
2024-01-22 08:00:00 317.9   317.9   310.7   313.5
2024-01-22 12:00:00 313.4   315.3   308.4   311.1
2024-01-22 16:00:00 311.2   312.5   303.6   306.6

Solution

  • If you want to resample with 5h periods except the last one, so that it always starts at midnight, you should normalize your dates and use them as grouper for groupby.resample:

    Setup for this example

    N = 1000
    df = pd.DataFrame(np.random.random((N, 4)),
                      columns=['Open', 'High', 'Low', 'Close'],
                      index=pd.date_range('2024-01-21', periods=N, freq='10min')
                              .rename('Date')
                     )
    
    d = {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}
    

    Code

    out = (df.groupby(df.index.normalize(), group_keys=False)
             .resample('5h', origin='start_day').agg(d)
          )
    

    Example output (first three days):

                             Open      High       Low     Close
    Date                                                       
    2024-01-21 00:00:00  0.854852  0.991030  0.001056  0.434836
    2024-01-21 05:00:00  0.423933  0.993614  0.024808  0.970102
    2024-01-21 10:00:00  0.114825  0.992917  0.000970  0.423687
    2024-01-21 15:00:00  0.368081  0.934378  0.019684  0.474279
    2024-01-21 20:00:00  0.618224  0.991313  0.004855  0.828402
    2024-01-22 00:00:00  0.314505  0.981417  0.015882  0.672777
    2024-01-22 05:00:00  0.637003  0.928499  0.018432  0.081630
    2024-01-22 10:00:00  0.798733  0.939966  0.002545  0.779787
    2024-01-22 15:00:00  0.786353  0.980841  0.007235  0.622986
    2024-01-22 20:00:00  0.455437  0.985423  0.207164  0.492761
    2024-01-23 00:00:00  0.759215  0.943299  0.012884  0.610557
    2024-01-23 05:00:00  0.491669  0.907216  0.022135  0.204769
    2024-01-23 10:00:00  0.035259  0.950254  0.020136  0.533030
    2024-01-23 15:00:00  0.977045  0.958560  0.000002  0.791182
    2024-01-23 20:00:00  0.327000  0.953044  0.021118  0.559102