Python Masters
I'm trying to speed up my code with pandas .apply() function.
However, I'm facing a problem that I don't understand how to solve.
The main goal of the script is to loop over DataFrame and determine a distance between 2 points on a map. For that I'm using geopy library and built the function:
def distance_2points(lat1, long1, lat2, long2):
coord1 = (lat1, long1)
coord2 = (lat2, long2)
results = distance.distance(coord1, coord2).km
return results
When I'm testing function it works with no issues but when I'm trying to use it with.apply() I get
ValueError: Point coordinates must be finite. (nan, nan, 0.0) has been passed as coordinates.
Full code
from geopy import distance
import pandas as pd
from datetime import datetime
import time
startTime = datetime.now()
print(datetime.now() - startTime)
lat1 = 40.067982
long1 = -75.056641
def distance_2points(lat1, long1, lat2, long2):
coord1 = (lat1, long1)
coord2 = (lat2, long2)
results = distance.distance(coord1, coord2).km
return results
df = pd.read_csv('data.csv')
df['distance'] = df.apply(lambda row: distance_2points(lat1, long1, lat2=row['lat'], long2=row['long'] ), axis=1)
print(datetime.now() - startTime)
Could anyone please explain what is the problem?
Example of data https://docs.google.com/spreadsheets/d/11sahfFQcv_PcODUvFxe6ziY_TeBjDkfLCpf2baqEKck/edit?usp=sharing
Try this:
from geopy import distance
import pandas as pd
from datetime import datetime
import time
startTime = datetime.now()
print(datetime.now() - startTime)
lat1 = 40.067982
long1 = -75.056641
def distance_2points(row):
coord1 = (lat1, long1)
coord2 = (row['lat'], row['long'])
results = distance.distance(coord1, coord2).km
return results
df = pd.read_csv('data.csv')
df['distance'] = df.apply(lambda row: distance_2points(row), axis=1)
print(datetime.now() - startTime)
In fact, you can further simplify this by applying the named function directly to your dataframe without using a lambda:
df['distance'] = df.apply(distance_2points, axis=1)