Search code examples
pandaspandas-groupbygeopandas

Calculate the distance travelled by each inspector using lat, long of tracking data using groupby pandas


I have a dataframe as shown below. Which is inspector tracking data (latitude and longitude) almost in each minute.

Inspector_ID   Timestamp               Latitude      Longitude
1              2018-07-24 7:31:00      100.491491    13.725239
1              2018-07-24 7:31:01      101.491491    15.725239
1              2018-07-24 7:32:04      104.491491    14.725239
1              2018-07-24 7:33:06      102.491491    10.725239
2              2018-07-24 8:35:08      105.491491    8.725239
2              2018-07-24 8:36:10      101.491491    15.725239
2              2018-07-24 8:37:09      101.491491    12.725239
2              2018-07-24 8:39:00      106.491491    16.725239

From the above data I would like find out the distance travelled by an inspector from each consecutive tracking data (latitude and longitude of the inspector).

Expected output (example)

Inspector_ID      Timestamp               Latitude      Longitude    Distance
    1              2018-07-24 7:31:00      100.491491    13.725239    nan
    1              2018-07-24 7:31:01      101.491491    15.725239    2.3
    1              2018-07-24 7:32:04      104.491491    14.725239    1.2
    1              2018-07-24 7:33:06      102.491491    10.725239    3.6
    2              2018-07-24 8:35:08      105.491491    8.725239     nan
    2              2018-07-24 8:36:10      101.491491    15.725239    5.6
    2              2018-07-24 8:37:09      101.491491    12.725239    2.1
    2              2018-07-24 8:39:00      106.491491    16.725239    3

Here I would like to calculate the distance groupby Inspector_ID.

Note: The number populated in Distance column are not the correct distance.

I am not aware, how to calculate distance using latitude and longitude. I am very new in pandas as well.


Solution

  • Use GroupBy.diff:

    df['distance']=df.groupby('Inspector_ID')[['Latitude','Longitude']].diff().pow(2).sum(axis=1,min_count=1).pow(1/2)
    print(df)
    
       Inspector_ID           Timestamp    Latitude  Longitude  distance
    0             1 2018-07-24 07:31:00  100.491491  13.725239       NaN
    1             1 2018-07-24 07:31:01  101.491491  15.725239  2.236068
    2             1 2018-07-24 07:32:04  104.491491  14.725239  3.162278
    3             1 2018-07-24 07:33:06  102.491491  10.725239  4.472136
    4             2 2018-07-24 08:35:08  105.491491   8.725239       NaN
    5             2 2018-07-24 08:36:10  101.491491  15.725239  8.062258
    6             2 2018-07-24 08:37:09  101.491491  12.725239  3.000000
    7             2 2018-07-24 08:39:00  106.491491  16.725239  6.403124
    

    if you want 0 instead of NaN remove min_count = 1

    • What is done here is to calculate the module of each vector constructed from consecutive points of each inspector