I have a dataframe for a protocol that tracks the value of 2 settings every minute of a test. For example:
In [1]: df = pd.DataFrame(
{
"time": [
pd.Timedelta(1, unit="min"),
pd.Timedelta(2, unit="min"),
pd.Timedelta(3, unit="min"),
pd.Timedelta(4, unit="min"),
pd.Timedelta(5, unit="min"),
],
"setting_1": [4.0, 4.0, 6.0, 6.0, 8.0],
"setting_2": [1.0, 2.0, 3.0, 4.0, 5.0],
}
).set_index("time")
In [2]: df.head()
Out[2]:
setting_1 setting_2
time
0 days 00:01:00 4.0 1.0
0 days 00:02:00 4.0 2.0
0 days 00:03:00 6.0 3.0
0 days 00:04:00 6.0 4.0
0 days 00:05:00 8.0 5.0
I need to join this dataframe with another which contains the outcome measures of the test but the data in that dataframe is sampled every 10s. Thus, I expand df
so the timedelta index increases by 10s and the missing values are backfilled.
In [3]: df = df.resample("10S").bfill()
In [4]: df.head()
Out[4]:
setting_1 setting_2
time
0 days 00:01:00 4.0 1.0
0 days 00:01:10 4.0 1.0
0 days 00:01:20 4.0 1.0
0 days 00:01:30 4.0 1.0
0 days 00:01:40 4.0 1.0
However, I want the index to start at a timedelta of 10s (i.e., 0 days 00:00:10
) rather than the 0 days 00:01:00
which is the minimum value in the csv file the data is imported from. As the values of setting_1
and setting_2
over that first minute are represented by the values at 1min, they should also be backfilled.
I currently solve this by concatenating a new dataframe containing a single row with an index of pd.Timedelta(10, unit="s")
and the column values corresponding to the 1 min row to the original df
. I can then use .resample().bfill()
as before to obtain what I need.
In [5]: df = pd.concat(
[
df,
pd.DataFrame(
{
"time": [pd.Timedelta(10, unit="s")],
"setting_1": [df.iloc[0, 0]
"setting_2": [df.iloc[0, 1]
}
).set_index("time")
]
)
In [6]: df
Out[6]:
setting_1 setting_2
time
0 days 00:01:00 4.0 1.0
0 days 00:02:00 4.0 2.0
0 days 00:03:00 6.0 3.0
0 days 00:04:00 6.0 4.0
0 days 00:05:00 8.0 5.0
0 days 00:00:10 4.0 1.0
In [7]: df = df.resample("10S").bfill()
In [8]: df.head()
Out[8]:
setting_1 setting_2
time
0 days 00:00:10 4.0 1.0
0 days 00:00:20 4.0 1.0
0 days 00:00:30 4.0 1.0
0 days 00:00:40 4.0 1.0
0 days 00:00:50 4.0 1.0
Is there a better way to achieve this without the need for the intermediary step of concatenating a dummy dataframe? I.e., is there a way to resample below the current minimum TimedeltaIndex by defining some sort of 'minimum' setting for the .resample()
method?
Rather reindex
with a new timedelta_range
idx = pd.timedelta_range('0', df.index.max(), freq='10s')
out = df.reindex(idx).bfill()
Output:
setting_1 setting_2
0 days 00:00:00 4.0 1.0
0 days 00:00:10 4.0 1.0
0 days 00:00:20 4.0 1.0
0 days 00:00:30 4.0 1.0
0 days 00:00:40 4.0 1.0
0 days 00:00:50 4.0 1.0
0 days 00:01:00 4.0 1.0
0 days 00:01:10 4.0 2.0
0 days 00:01:20 4.0 2.0
0 days 00:01:30 4.0 2.0
0 days 00:01:40 4.0 2.0
0 days 00:01:50 4.0 2.0
0 days 00:02:00 4.0 2.0
0 days 00:02:10 6.0 3.0
0 days 00:02:20 6.0 3.0
0 days 00:02:30 6.0 3.0
0 days 00:02:40 6.0 3.0
0 days 00:02:50 6.0 3.0
0 days 00:03:00 6.0 3.0
0 days 00:03:10 6.0 4.0
0 days 00:03:20 6.0 4.0
0 days 00:03:30 6.0 4.0
0 days 00:03:40 6.0 4.0
0 days 00:03:50 6.0 4.0
0 days 00:04:00 6.0 4.0
0 days 00:04:10 8.0 5.0
0 days 00:04:20 8.0 5.0
0 days 00:04:30 8.0 5.0
0 days 00:04:40 8.0 5.0
0 days 00:04:50 8.0 5.0
0 days 00:05:00 8.0 5.0