Search code examples
pythonpandasgeopy

Calculate distance between latitude and longitude in dataframe


I have 4 columns in my dataframe containing the following data:

Start_latitude<br>
Start_longitude<br>
Stop_latitude<br>
Stop_longitude<br>

I need to compute distance between the latitude longitude pair and create a new column with the distance computed.

I came across a package (geopy) which can do this for me. But I need to pass a tuple to geopy. How do i apply this function (geopy) across the dataframe in pandas for all the records?


Solution

  • I'd recommend you use pyproj instead of geopy. geopy relies on online services whereas pyproj is local (meaning it will be faster and won't rely on an internet connection) and more transparent about its methods (see here for instance), which are based on the Proj4 codebase that underlies essentially all open-source GIS software and, probably, many of the web services you'd use.

    #!/usr/bin/env python3
    
    import pandas as pd
    import numpy as np
    from pyproj import Geod
    
    wgs84_geod = Geod(ellps='WGS84') #Distance will be measured on this ellipsoid - more accurate than a spherical method
    
    #Get distance between pairs of lat-lon points
    def Distance(lat1,lon1,lat2,lon2):
      az12,az21,dist = wgs84_geod.inv(lon1,lat1,lon2,lat2) #Yes, this order is correct
      return dist
    
    #Create test data
    lat1 = np.random.uniform(-90,90,100)
    lon1 = np.random.uniform(-180,180,100)
    lat2 = np.random.uniform(-90,90,100)
    lon2 = np.random.uniform(-180,180,100)
    
    #Package as a dataframe
    df = pd.DataFrame({'lat1':lat1,'lon1':lon1,'lat2':lat2,'lon2':lon2})
    
    #Add/update a column to the data frame with the distances (in metres)
    df['dist'] = Distance(df['lat1'].tolist(),df['lon1'].tolist(),df['lat2'].tolist(),df['lon2'].tolist())
    

    PyProj has some documentation here.