Search code examples
pythonpandasdistance

Distance Matrix between rows of a Pandas Dataframe with Lat and Lon


I have a Pandas DataFrame with the coordinates of different cell towers where one column is the Latitude and another column is the Longitude like this:

         Tower_Id    Latitude   Longitude    

 0.        a1           x1         y1

 1.        a2           x2         y2

 2.        a3           x3         y3

and so on

I need to get the distances between each cell tower and all the others, and subsequently between each cell tower and its closest neighbouring tower.

I have been trying to recycle some code of the distance between the location of the tower and the expected location of a tower that I got from interpolation (in this case I had 4 different columns, 2 for the coordinates and 2 for the expected coordinates). The code I had used is the following:

def haversine(row):
    lon1 = row['Lon']
    lat1 = row['Lat']
    lon2 = row['Expected_Lon']
    lat2 = row['Expected_Lat']
    lon1, lat1, lon2, lat2 = map(math.radians, [lon1,    lat1, lon2, lat2])
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
    c = 2 * math.asin(math.sqrt(a)) 
    km = 6367 * c
    return km

I have not been able to now compute the distance matrix of the cell towers in the DataFrame that I have now. Can anybody help me with this one?


Solution

  • Scipy's distance_matrix essentially uses broadcast, so here's a solution

    # toy data
    lendf = 4
    np.random.seed(1)
    lats = np.random.uniform(0,180, lendf)
    np.random.seed(2)
    lons = np.random.uniform(0,360, lendf)
    df = pd.DataFrame({'Tower_Id': range(lendf),
                       'Lat': lats,
                       'Lon': lons})
    df.head()
    #   Tower_Id    Lat         Lon
    #0  0           75.063961   156.958165
    #1  1           129.658409  9.333443
    #2  2           0.020587    197.878492
    #3  3           54.419863   156.716061
    
    # x contains lat-lon values
    x = df[['Lat','Lon']].values * (np.pi/180.0)
    
    # sine of differences
    sine_diff = np.sin((x - x[:,None,:])/2)**2
    
    # cosine of lat
    lat_cos = np.cos(x[:,0])
    
    a = sine_diff [:,:,0] + lat_cos * lat_cos[:, None] * sine_diff [:,:,1]
    c = 2 * 6373 * np.arcsin(np.sqrt(d))
    

    Output (c):

    array([[   0.        , 3116.76244275, 8759.2773379 , 2296.26375266],
           [3116.76244275,    0.        , 5655.63934703, 2239.2455718 ],
           [8759.2773379 , 5655.63934703,    0.        , 7119.00606308],
           [2296.26375266, 2239.2455718 , 7119.00606308,    0.        ]])