Search code examples
pythondataframemathjupyter-notebookdata-analysis

distance between 2 gps coordinates


enter image description hereI have a dataframe with gps Data (longitude and latitude) and a tripId, I wanna calculate the distance between every gps coordinates (every row) for each tripId, is it possible to add a new column "Distance" which contains the results (i will have sum(row)-1 )?

-   timestamp           longitude   latitude    tripId 
0   2021-04-30 21:13:53 8.211610    53.189479   1790767 
1   2021-04-30 21:13:54 8.211462    53.189479   1790767 
2   2021-04-30 21:13:55 8.211367    53.189476   1790767 
3   2021-04-30 21:13:56 8.211343    53.189479   1790767 
4   2021-04-30 21:13:57 8.211335    53.189490   1790767 
5   2021-04-30 21:13:59 8.211338    53.189491   1790767 
6   2021-04-30 21:14:00 8.211299    53.189479   1790767 
7   2021-04-30 21:14:01 8.211311    53.189468   1790767 
8   2021-04-30 21:14:02 8.211327    53.189446   1790767 
9   2021-04-30 21:14:03 8.211338    53.189430   1790767

I've tested it for the first 10 rows but still doesn't work

    import math

def haversine(coord1, coord2):
    R = 6372800 # Earth radius in meters
    lat1, lon1 = coord1
    lat2, lon2 = coord2
    
    phi1, phi2 = math.radians(lat1), math.radians(lat2)
    dphi = math.radians(lat2 - lat1)
    dlambda = math.radians(lon2 - lon1)
    
    a = math.sin(dphi/2)**2 + \
        math.cos(phi1)*math.cos(phi2)*math.sin(dlambda/2)**2
    
    return 2*R*math.atan2(math.sqrt(a), math.sqrt(1 - a))


    x= df.tripId[0]
        
    for i in range(0,10):
        while(df.tripId[i]== x):
            coord1= df.latitude[i], df.longitude[i]
            coord2= df.latitude[i+1], df.longitude[i+1]
            df.distance=haversine(coord1, coord2)

Solution

  • The haversine module already contains a function that can directly process vectors. As your input data is already a dataframe, you should use haversine_vector. You can compute directly the distance colum with it even if your dataframe contains more than one idTrip value:

    def calc_dist(df):
        s = pd.Series(haversine.haversine_vector(df, df.shift()),
                 index=df.index, name='distance')
        return pd.DataFrame(s)
    
    df = pd.concat([df, df.groupby('idTrip')[['latitude', 'longitude']].apply(calc_dist)],
                   axis=1)
    

    From your sample data, it gives:

    -            timestamp  longitude   latitude   tripId  distance
    0  2021-04-30 21:13:53   8.211610  53.189479  1790767       NaN
    1  2021-04-30 21:13:54   8.211462  53.189479  1790767  0.009860
    2  2021-04-30 21:13:55   8.211367  53.189476  1790767  0.006338
    3  2021-04-30 21:13:56   8.211343  53.189479  1790767  0.001633
    4  2021-04-30 21:13:57   8.211335  53.189490  1790767  0.001334
    5  2021-04-30 21:13:59   8.211338  53.189491  1790767  0.000229
    6  2021-04-30 21:14:00   8.211299  53.189479  1790767  0.002921
    7  2021-04-30 21:14:01   8.211311  53.189468  1790767  0.001461
    8  2021-04-30 21:14:02   8.211327  53.189446  1790767  0.002668
    9  2021-04-30 21:14:03   8.211338  53.189430  1790767  0.001924