I have a lot of data from a CSV file which looks like this:
MMSI, BaseDateTime, LAT, LON, SOG, COG
111, 2023-01-01T00:01:19, 27.3538, -94.6253, 0.1, 35.3
111, 2023-01-01T00:04:18, 27.35372, -94.6253, 0.1, 18.3
111, 2023-01-01T00:07:19, 27.35372, -94.62534, 0.1, 290.0
111, 2023-01-01T00:10:19, 27.35374, -94.62538, 0.1, 249.5
111, 2023-01-01T00:16:18, 27.35376, -94.62543, 0.1, 225.5
1056261,2023-01-01T00:00:12, 26.11815, -80.14815, 0.0, 300.4
1056261,2023-01-01T00:01:21, 26.11817, -80.14821, 0.0, 291.8
1056261,2023-01-01T00:02:32, 26.11814, -80.14817, 0.0, 284.0
1056261,2023-01-01T00:03:41, 26.11815, -80.14819, 0.0, 288.9
MMSI should be looked at as a form of id for a boat.
Now my problem is that the intervals in the BaseDateTime are very uneven. Sometimes there is only a delay of a few minutes while other times there are delays way longer.
This does not work for me as I want to have data points for, let us say, every single minute instead. Is it possible for me to use interpolation for this?
I have already tried doing this with the pandas interpolation but it seems to be outdated.
Here's an example of resampling the COG
column in one minute intervals, and interpolating missing values.
cog_interpolated = pd.Series(df["COG"].values, index=df["BaseDateTime"].values)
cog_interpolated = cog_interpolated.resample('T').mean()
cog_interpolated.interpolate(method='time', inplace=True)