Search code examples
pythonpandaspandas-resample

Define a 'minimum' for pandas.DataFrame.resample() that is lower than current dataframe's minimum TimedeltaIndex


I have a dataframe for a protocol that tracks the value of 2 settings every minute of a test. For example:

In [1]: df = pd.DataFrame(
            {
                "time": [
                    pd.Timedelta(1, unit="min"),
                    pd.Timedelta(2, unit="min"),
                    pd.Timedelta(3, unit="min"),
                    pd.Timedelta(4, unit="min"),
                    pd.Timedelta(5, unit="min"),
                ],
                "setting_1": [4.0, 4.0, 6.0, 6.0, 8.0],
                "setting_2": [1.0, 2.0, 3.0, 4.0, 5.0],
            }
        ).set_index("time")

In [2]: df.head()
Out[2]:
                 setting_1  setting_2
time                         
0 days 00:01:00        4.0        1.0
0 days 00:02:00        4.0        2.0
0 days 00:03:00        6.0        3.0
0 days 00:04:00        6.0        4.0
0 days 00:05:00        8.0        5.0

I need to join this dataframe with another which contains the outcome measures of the test but the data in that dataframe is sampled every 10s. Thus, I expand df so the timedelta index increases by 10s and the missing values are backfilled.

In [3]: df = df.resample("10S").bfill()

In [4]: df.head()
Out[4]:
                 setting_1  setting_2
time                         
0 days 00:01:00        4.0        1.0
0 days 00:01:10        4.0        1.0
0 days 00:01:20        4.0        1.0
0 days 00:01:30        4.0        1.0
0 days 00:01:40        4.0        1.0

However, I want the index to start at a timedelta of 10s (i.e., 0 days 00:00:10) rather than the 0 days 00:01:00 which is the minimum value in the csv file the data is imported from. As the values of setting_1 and setting_2 over that first minute are represented by the values at 1min, they should also be backfilled.

I currently solve this by concatenating a new dataframe containing a single row with an index of pd.Timedelta(10, unit="s") and the column values corresponding to the 1 min row to the original df. I can then use .resample().bfill() as before to obtain what I need.

In [5]: df = pd.concat(
            [
                df,
                pd.DataFrame(
                    {
                        "time": [pd.Timedelta(10, unit="s")],
                        "setting_1": [df.iloc[0, 0]
                        "setting_2": [df.iloc[0, 1]
                    }
                ).set_index("time")
            ]
        )

In [6]: df
Out[6]:
                 setting_1  setting_2
time                         
0 days 00:01:00        4.0        1.0
0 days 00:02:00        4.0        2.0
0 days 00:03:00        6.0        3.0
0 days 00:04:00        6.0        4.0
0 days 00:05:00        8.0        5.0
0 days 00:00:10        4.0        1.0

In [7]: df = df.resample("10S").bfill()

In [8]: df.head()
Out[8]:
                 setting_1  setting_2
time                         
0 days 00:00:10        4.0        1.0
0 days 00:00:20        4.0        1.0
0 days 00:00:30        4.0        1.0
0 days 00:00:40        4.0        1.0
0 days 00:00:50        4.0        1.0

Is there a better way to achieve this without the need for the intermediary step of concatenating a dummy dataframe? I.e., is there a way to resample below the current minimum TimedeltaIndex by defining some sort of 'minimum' setting for the .resample() method?


Solution

  • Rather reindex with a new timedelta_range

    idx = pd.timedelta_range('0', df.index.max(), freq='10s')
    
    out = df.reindex(idx).bfill()
    

    Output:

                     setting_1  setting_2
    0 days 00:00:00        4.0        1.0
    0 days 00:00:10        4.0        1.0
    0 days 00:00:20        4.0        1.0
    0 days 00:00:30        4.0        1.0
    0 days 00:00:40        4.0        1.0
    0 days 00:00:50        4.0        1.0
    0 days 00:01:00        4.0        1.0
    0 days 00:01:10        4.0        2.0
    0 days 00:01:20        4.0        2.0
    0 days 00:01:30        4.0        2.0
    0 days 00:01:40        4.0        2.0
    0 days 00:01:50        4.0        2.0
    0 days 00:02:00        4.0        2.0
    0 days 00:02:10        6.0        3.0
    0 days 00:02:20        6.0        3.0
    0 days 00:02:30        6.0        3.0
    0 days 00:02:40        6.0        3.0
    0 days 00:02:50        6.0        3.0
    0 days 00:03:00        6.0        3.0
    0 days 00:03:10        6.0        4.0
    0 days 00:03:20        6.0        4.0
    0 days 00:03:30        6.0        4.0
    0 days 00:03:40        6.0        4.0
    0 days 00:03:50        6.0        4.0
    0 days 00:04:00        6.0        4.0
    0 days 00:04:10        8.0        5.0
    0 days 00:04:20        8.0        5.0
    0 days 00:04:30        8.0        5.0
    0 days 00:04:40        8.0        5.0
    0 days 00:04:50        8.0        5.0
    0 days 00:05:00        8.0        5.0