Search code examples
pythonpython-3.xpandasapplygeopy

Pandas function .apply() not passing arguments "ValueError: Point coordinates must be finite. (nan, nan, 0.0) has been passed as coordinates."


Python Masters I'm trying to speed up my code with pandas .apply() function. However, I'm facing a problem that I don't understand how to solve.
The main goal of the script is to loop over DataFrame and determine a distance between 2 points on a map. For that I'm using geopy library and built the function:

def distance_2points(lat1, long1, lat2, long2):
    coord1 = (lat1, long1)
    coord2 = (lat2, long2)
    results = distance.distance(coord1, coord2).km
    return results

When I'm testing function it works with no issues but when I'm trying to use it with.apply() I get

ValueError: Point coordinates must be finite. (nan, nan, 0.0) has been passed as coordinates.

Full code

from geopy import distance
import pandas as pd
from datetime import datetime
import time
startTime = datetime.now()
print(datetime.now() - startTime)
lat1 = 40.067982
long1 = -75.056641
def distance_2points(lat1, long1, lat2, long2):
    coord1 = (lat1, long1)
    coord2 = (lat2, long2)
    results = distance.distance(coord1, coord2).km
    return results
df = pd.read_csv('data.csv')
df['distance'] = df.apply(lambda row: distance_2points(lat1, long1, lat2=row['lat'], long2=row['long'] ), axis=1)
print(datetime.now() - startTime)

Could anyone please explain what is the problem?

Example of data https://docs.google.com/spreadsheets/d/11sahfFQcv_PcODUvFxe6ziY_TeBjDkfLCpf2baqEKck/edit?usp=sharing


Solution

  • Try this:

    from geopy import distance
    import pandas as pd
    from datetime import datetime
    import time
    
    startTime = datetime.now()
    print(datetime.now() - startTime)
    lat1 = 40.067982
    long1 = -75.056641
    
    def distance_2points(row):
        coord1 = (lat1, long1)
        coord2 = (row['lat'], row['long'])
        results = distance.distance(coord1, coord2).km
        return results
    
    df = pd.read_csv('data.csv')
    df['distance'] = df.apply(lambda row: distance_2points(row), axis=1)
    print(datetime.now() - startTime)
    

    In fact, you can further simplify this by applying the named function directly to your dataframe without using a lambda:

    df['distance'] = df.apply(distance_2points, axis=1)