python pandas dataframe numpy linear-interpolation

Expand time series data in pandas dataframe

I am attempting to interpolate between time points for all data in a pandas dataframe. My current data is in time increments of 0.04 seconds. I want it to be in increments of 0.01 seconds to match another data set. I realize I can use the DataFrame.interpolate() function to do this. However, I am stuck on how to insert 3 rows of NaN in-between every row of my dataframe in an efficient manner.

import pandas as pd
import numpy as np

df = pd.DataFrame(data={"Time": [0.0, 0.04, 0.08, 0.12], 
                        "Pulse": [76, 74, 77, 80],
                        "O2":[99, 100, 99, 98]})
df_ins = pd.DataFrame(data={"Time": [np.nan, np.nan, np.nan], 
                            "Pulse": [np.nan, np.nan, np.nan],
                            "O2":[np.nan, np.nan, np.nan]})

I want df to transform from this:

    Time    Pulse   O2
0   0.00    76      99
1   0.04    74      100
2   0.08    77      99
3   0.12    80      98

To something like this:

    Time    Pulse   O2
0   0.00    76      99
1   NaN     NaN     NaN
2   NaN     NaN     NaN
3   NaN     NaN     NaN
4   0.04    74      100
5   NaN     NaN     NaN
6   NaN     NaN     NaN
7   NaN     NaN     NaN
8   0.08    77      99
9   NaN     NaN     NaN
10  NaN     NaN     NaN
11  NaN     NaN     NaN
12  0.12    80      98

Which I can then call on

df = df.interpolate()

Which would yield something like this (I'm making up the numbers here):

    Time    Pulse   O2
0   0.00    76      99
1   0.01    76      99
2   0.02    75      99
3   0.03    74      100
4   0.04    74      100
5   0.05    75      100
6   0.06    76      99
7   0.07    77      99
8   0.08    77      99
9   0.09    77      99
10  0.10    78      98
11  0.11    79      98
12  0.12    80      98

I attempted to use an iterrows technique by inserting the df_ins frame after every row. But my index was thrown off during the iteration. I also tried slicing df and concatenating the df slices and df_ins, but once again the indexes were thrown off by the loop.

Does anyone have any recommendations on how to do this efficiently?

Solution

Use resample here (replace ffill with your desired behavior, maybe mess around with interpolate)

df["Time"] = pd.to_timedelta(df["Time"], unit="S")
df.set_index("Time").resample("0.01S").ffill()

                 Pulse   O2
Time
00:00:00            76   99
00:00:00.010000     76   99
00:00:00.020000     76   99
00:00:00.030000     76   99
00:00:00.040000     74  100
00:00:00.050000     74  100
00:00:00.060000     74  100
00:00:00.070000     74  100
00:00:00.080000     77   99
00:00:00.090000     77   99
00:00:00.100000     77   99
00:00:00.110000     77   99
00:00:00.120000     80   98

If you do want to interpolate:

df.set_index("Time").resample("0.01S").interpolate()

                 Pulse      O2
Time
00:00:00         76.00   99.00
00:00:00.010000  75.50   99.25
00:00:00.020000  75.00   99.50
00:00:00.030000  74.50   99.75
00:00:00.040000  74.00  100.00
00:00:00.050000  74.75   99.75
00:00:00.060000  75.50   99.50
00:00:00.070000  76.25   99.25
00:00:00.080000  77.00   99.00
00:00:00.090000  77.75   98.75
00:00:00.100000  78.50   98.50
00:00:00.110000  79.25   98.25
00:00:00.120000  80.00   98.00