Search code examples
pythonpandasnumpyhaversine

How to use haversine distance using haversine library on pandas dataframe


Here's using how I use haversine library to calculate distance between two points

import haversine as hs
hs.haversine((106.11333888888888,-1.94091666666667),(96.698661, 5.204783))

Here's how to calculate haversine distance using sklearn

from sklearn.metrics.pairwise import haversine_distances
import numpy as np
radian_1 = np.radians(df1[['lat','lon']])
radian_2 = np.radians(df2[['lat','lon']])
D = pd.DataFrame(haversine_distances(radian_1,radian_2)*6371,index=df1.index, columns=df2.index)

What i need is doing similar things but instead using sklearn.metrics.pairwise library, I use haversine library

Here's my dataset df1

   index       lon        lat
0   0   107.071969  -6.347778
1   1   110.431361  -7.773489
2   2   111.978469  -8.065442

and dataset df2

    index      lon        lat
5   5   112.340919  -7.520442
6   6   107.179119  -6.291131
7   7   106.807442  -6.437383

Here's expected output

        5           6           7
    0  596.019968   13.413123   30.882602
    1  212.317223  394.942014  426.564799
    2   72.573637  565.020998  598.409848

Solution

  • You can use itertools.product for creating all cases then use haversine for getting results like the below:

    import haversine as hs
    import pandas as pd
    import numpy as np
    import itertools
    
    res = []
    for a,b in (itertools.product(*[df1.values , df2.values])):
        res.append(hs.haversine(a,b))
    
    m = int(np.sqrt(len(res)))
    df = pd.DataFrame(np.asarray(res).reshape(m,m))
    print(df)
    

    Output:

                0           1           2
    0  587.500555   12.058061   29.557005
    1  212.580742  365.487782  405.718803
    2   46.333180  537.684789  578.072579