Search code examples
pythoninterpolation

interpolating data from df, with multiple columns including datetime


I have a lot of data from a CSV file which looks like this:

MMSI,   BaseDateTime,        LAT,       LON,      SOG, COG
111,    2023-01-01T00:01:19, 27.3538,  -94.6253,  0.1, 35.3
111,    2023-01-01T00:04:18, 27.35372, -94.6253,  0.1, 18.3
111,    2023-01-01T00:07:19, 27.35372, -94.62534, 0.1, 290.0
111,    2023-01-01T00:10:19, 27.35374, -94.62538, 0.1, 249.5
111,    2023-01-01T00:16:18, 27.35376, -94.62543, 0.1, 225.5
1056261,2023-01-01T00:00:12, 26.11815, -80.14815, 0.0, 300.4
1056261,2023-01-01T00:01:21, 26.11817, -80.14821, 0.0, 291.8
1056261,2023-01-01T00:02:32, 26.11814, -80.14817, 0.0, 284.0
1056261,2023-01-01T00:03:41, 26.11815, -80.14819, 0.0, 288.9

MMSI should be looked at as a form of id for a boat.

Now my problem is that the intervals in the BaseDateTime are very uneven. Sometimes there is only a delay of a few minutes while other times there are delays way longer.

This does not work for me as I want to have data points for, let us say, every single minute instead. Is it possible for me to use interpolation for this?

I have already tried doing this with the pandas interpolation but it seems to be outdated.


Solution

  • Here's an example of resampling the COG column in one minute intervals, and interpolating missing values.

    cog_interpolated = pd.Series(df["COG"].values, index=df["BaseDateTime"].values)
    cog_interpolated = cog_interpolated.resample('T').mean()
    
    cog_interpolated.interpolate(method='time', inplace=True)
    

    enter image description here