Search code examples
pythonpandasdataframesumsampling

Pandas upsampling a dataframe and make a cumulative sum on a specific window


I have a dataframe with values every 3 hours, which I upsample to 1 hour. The new bins are kept empty with NaNs. I want to fill those NaNs with values which when summed are equal to the value of the bin that was not upsampled, and "de-sum" the value of the bin not-upsampled.

For example: I have 3 bins. The 3rd bin has a value of 3, bins 1 & 2 have NaNs. I want to fill bins 1,2 & 3 with 1 for each of them. In the end, if I had a cumulative sum every 3 bins, the result would be equal to the value of my bin before the upsampling.

I wrote an example of what I mean (sorry I have trouble explaining clearly). Is there a better way to do that ?

import numpy as np
import pandas as pd
from datetime import *

# Create df with a datetime index every 3 hours
rng = pd.date_range('2000-01-01', periods=365*(24/3), freq='3H') 
df = pd.DataFrame({'Val': np.random.randn(len(rng)) }, index = rng)

# Upsample to 1H but keep the new bins empty
df = df.resample('1H').asfreq()

# Create a copy of df to verify that the sum went well
df_summed_every_3_bins = df.copy()

# Create a counter to the next bin holding a value
to_full_bin = 2

# We de-sum the first value
df.Val.values[0] = df.Val.values[0]/3
for i in range(1,len(df)):
    
    # Take the value from a bin, divide it by 3 and insert it in the dataframe
    df.Val.values[i] = df.Val.values[i+to_full_bin]/3
    
    # We move forward in df, meaning that the bin with a value is approaching. So we reduce the counter by 1
    to_full_bin = to_full_bin-1
    
    # When the variable is equal to -1, it means we need to reinitialized our counter
    if to_full_bin == -1:
        to_full_bin = 2

Solution

  • Resample the dataframe, then backfill and divide by 3

    df.resample('1H').bfill().div(3)
    

                              Val
    2000-01-01 00:00:00 -0.747733
    2000-01-01 01:00:00 -0.057699
    2000-01-01 02:00:00 -0.057699
    2000-01-01 03:00:00 -0.057699
    2000-01-01 04:00:00 -0.409512
    2000-01-01 05:00:00 -0.409512
    2000-01-01 06:00:00 -0.409512
    2000-01-01 07:00:00 -0.108856
    2000-01-01 08:00:00 -0.108856
    2000-01-01 09:00:00 -0.108856
    ...