Search code examples
python-3.xpandasgeolocationlatitude-longitude

How to calculate distance using latitude and longitude in a pandas dataframe?


I have a data frame having two columns latitude and longitude, and 863 rows so that each row has a point coordinate defined by latitude and longitude. Now I want to calculate the distance between all the rows in kilometers. I am using the following reference link to obtain the distance between latitude and longitude pair. If there were a few rows, I could have done using the reference link. But I have large rows and I think I need a loop to achieve a solution to the problem. Since I am new to python I couldn't able to create a logic to looping this idea.

Reference link: Getting distance between two points based on latitude/longitude

My data frame looks like this:

read_randomly_generated_lat_lon.head(3)
Lat          Lon
43.937845   -97.905537
44.310739   -97.588820
44.914698   -99.003517

Solution

  • Please note: The following script does not account for the curvature of the earth. There are numerous documents Convert lat/long to XY explaining this problem.

    However, the distance between coordinates can be roughly determined. The export is a Series, which can be easily concatenated with your original df to provide a separate column displaying distance relative to your coordinates.

    d = ({
        'Lat' : [43.937845,44.310739,44.914698],       
        'Long' : [-97.905537,-97.588820,-99.003517],                               
         })
    
    df = pd.DataFrame(d)
    
    df = df[['Lat','Long']]
    
    point1 = df.iloc[0]
    
    def to_xy(point):
    
        r = 6371000 #radians of the earth (m)
        lam,phi = point
        cos_phi_0 = np.cos(np.radians(phi))
    
        return (r * np.radians(lam) * cos_phi_0, 
                r * np.radians(phi))
    
    point1_xy = to_xy(point1)
    
    df['to_xy'] = df.apply(lambda x: 
             tuple(x.values),
             axis=1).map(to_xy)
    
    df['Y'], df['X'] = df.to_xy.str[0], df.to_xy.str[1]
    
    df = df[['X','Y']] 
    df = df.diff()
    
    dist = np.sqrt(df['X']**2 + df['Y']**2)
    
    #Convert to km
    dist = dist/1000
    
    print(dist)
    
    0           NaN
    1     41.149537
    2    204.640462