I am attempting to interpolate between time points for all data in a pandas dataframe. My current data is in time increments of 0.04 seconds. I want it to be in increments of 0.01 seconds to match another data set. I realize I can use the DataFrame.interpolate() function to do this. However, I am stuck on how to insert 3 rows of NaN in-between every row of my dataframe in an efficient manner.
import pandas as pd
import numpy as np
df = pd.DataFrame(data={"Time": [0.0, 0.04, 0.08, 0.12],
"Pulse": [76, 74, 77, 80],
"O2":[99, 100, 99, 98]})
df_ins = pd.DataFrame(data={"Time": [np.nan, np.nan, np.nan],
"Pulse": [np.nan, np.nan, np.nan],
"O2":[np.nan, np.nan, np.nan]})
I want df to transform from this:
Time Pulse O2
0 0.00 76 99
1 0.04 74 100
2 0.08 77 99
3 0.12 80 98
To something like this:
Time Pulse O2
0 0.00 76 99
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 0.04 74 100
5 NaN NaN NaN
6 NaN NaN NaN
7 NaN NaN NaN
8 0.08 77 99
9 NaN NaN NaN
10 NaN NaN NaN
11 NaN NaN NaN
12 0.12 80 98
Which I can then call on
df = df.interpolate()
Which would yield something like this (I'm making up the numbers here):
Time Pulse O2
0 0.00 76 99
1 0.01 76 99
2 0.02 75 99
3 0.03 74 100
4 0.04 74 100
5 0.05 75 100
6 0.06 76 99
7 0.07 77 99
8 0.08 77 99
9 0.09 77 99
10 0.10 78 98
11 0.11 79 98
12 0.12 80 98
I attempted to use an iterrows technique by inserting the df_ins frame after every row. But my index was thrown off during the iteration. I also tried slicing df and concatenating the df slices and df_ins, but once again the indexes were thrown off by the loop.
Does anyone have any recommendations on how to do this efficiently?
Use resample
here (replace ffill
with your desired behavior, maybe mess around with interpolate
)
df["Time"] = pd.to_timedelta(df["Time"], unit="S")
df.set_index("Time").resample("0.01S").ffill()
Pulse O2
Time
00:00:00 76 99
00:00:00.010000 76 99
00:00:00.020000 76 99
00:00:00.030000 76 99
00:00:00.040000 74 100
00:00:00.050000 74 100
00:00:00.060000 74 100
00:00:00.070000 74 100
00:00:00.080000 77 99
00:00:00.090000 77 99
00:00:00.100000 77 99
00:00:00.110000 77 99
00:00:00.120000 80 98
If you do want to interpolate:
df.set_index("Time").resample("0.01S").interpolate()
Pulse O2
Time
00:00:00 76.00 99.00
00:00:00.010000 75.50 99.25
00:00:00.020000 75.00 99.50
00:00:00.030000 74.50 99.75
00:00:00.040000 74.00 100.00
00:00:00.050000 74.75 99.75
00:00:00.060000 75.50 99.50
00:00:00.070000 76.25 99.25
00:00:00.080000 77.00 99.00
00:00:00.090000 77.75 98.75
00:00:00.100000 78.50 98.50
00:00:00.110000 79.25 98.25
00:00:00.120000 80.00 98.00