Search code examples
pythonpandaslatitude-longitudehaversine

Pandas speed calculation based on travelled distance and time


I have the following dataframe:

data = [
    [ID, '2022-04-23T03:36:26Z', 60, 10, 83],
    [ID, '2022-04-23T03:37:30Z', Nan, Nan, Nan],
    [ID, '2022-04-23T03:37:48Z', Nan, Nan, Nan],
    [ID, '2022-04-23T03:38:24Z', 61, 11, 72],
    [ID, '2022-04-23T03:44:20Z', 63, 13, 75],
    [ID, '2022-04-23T03:45:02Z', Nan, Nan, Nan],
    [ID, '2022-04-23T03:45:06Z', Nan, Nan, Nan],
    [ID, '2022-04-23T03:45:08Z', Nan, Nan, Nan],
    [ID, '2022-04-23T03:45:12Z', Nan, Nan, Nan],
    [ID, '2022-04-23T03:45:48Z', 69, 15, 61]
]

df = pd.DataFrame(data=data,
                  columns=['ID', 'time', 'latitude', 'longitude', 'speed')

The problem is that for some rows I have only the time value e.g. row 2 and 3. For these rows, I want to calculate the average speed based on time, latitude and longitude of the row preceding (row 1) and following (row 4) the Nan speed rows.

For example, the speed value in row 2 and 3 should be an average speed value which is based on the travelled distance (maybe using Haversine formula) divided by the total amount of time ('2022-04-23T03:38:24Z' - '2022-04-23T03:36:26Z').

How can I write this in Python?


Solution

  • pandas.DataFrame.interpolate may be what you're looking for if you're looking for a naive approach (there's other options if you're looking for something more specific just see the docs):

    df[["latitude", "longitude", "speed"]] = df.interpolate() \
        [["latitude", "longitude", "speed"]].round().astype(int)
    

    Result:

    ID                  time  latitude  longitude  speed
    0  ID  2022-04-23T03:36:26Z        60         10     83
    1  ID  2022-04-23T03:37:30Z        60         10     79
    2  ID  2022-04-23T03:37:48Z        60         10     75
    3  ID  2022-04-23T03:38:24Z        61         11     72
    4  ID  2022-04-23T03:44:20Z        63         13     75
    5  ID  2022-04-23T03:45:02Z        64         13     72
    6  ID  2022-04-23T03:45:06Z        65         13     69
    7  ID  2022-04-23T03:45:08Z        66         14     66
    8  ID  2022-04-23T03:45:12Z        67         14     63
    9  ID  2022-04-23T03:45:48Z        69         15     61