I'm using the geopy package to search address for their coordinates, with the column returning the matched address and coordinates
I want to just get the coordinates
Here is a test to show you how it works:
# Test to see if response is obtained for easy address
location = geolocator.geocode("175 5th Avenue NYC", timeout=10)
print((location.latitude, location.longitude))
>>> (40.7410861, -73.9896298241625)
In my code I have a CSV with cities, that are then looked up using the geopy package
data['geocode_result'] = [geolocator.geocode(x, timeout = 60) for x in data['ghana_city']]
I want to just get the coordinates from here
Using extract doesn't seem to work and just returns NaN values despite the regex being fine:
p = r'(?P<latitude>-?\d+\.\d+)?(?P<longitude>-?\d+\.\d+)'
data[['g_latitude', 'g_longitude']] = data['geocode_result2'].str.extract(p, expand=True)
data
I have a feeling that these problems are coming about due to the object that's returned from geopy in the column
The regex is sound, as verified on Regexr.com:
I have tried converting the column to a string, but the coordinates are dropped?!
data['geocode_result2'] = (data['geocode_result2']).astype(str)
data
Can anyone help here? thanks a lot
Dummy data:
The column I want to extract the coordinates from is geocode_result2 or geocode_result
geocode_result2
1 (Agona Swedru, Central Region, Ghana, (5.534454, -0.700763))
2 (Madina, Adenta, Greater Accra Region, PMB 107 MD, Ghana, (5.6864962, -0.1677052))
3 (Ashaiman, Greater Accra Region, TM3 8AA, Ghana, (5.77329565, -0.110766330148484))
Final code to get coordinates:
data['geocode_result'] = [geolocator.geocode(x, timeout = 60) for x in data['ghana_city']]
x = data['geocode_result']
data.dropna(subset=['geocode_result'], inplace=True)
data['g_latitude'] = data['geocode_result'].apply(lambda loc: loc.latitude)
data['g_longitude'] = data['geocode_result'].apply(lambda loc: loc.longitude)
data
geolocator.geocode
returns Location
object rather than a string (though its string representation actually contains lat/long which you were trying to parse), so lat/long can be retrieved by accessing location.latitude
/ location.longitude
attributes respectively.
# Make geocoding requests
data['geocode_result'] = [geolocator.geocode(x, timeout = 60) for x in data['ghana_city']]
# Extract lat/long to separate columns
data['g_latitude'] = data['geocode_result'].apply(lambda loc: loc.latitude)
data['g_longitude'] = data['geocode_result'].apply(lambda loc: loc.longitude)
(I'm unable to comment due to reputation lack, so I'm answering the coordinates drop confusion here).
str(location)
returns a textual address (without coordinates), but repr(location)
returns a string in the following format (which includes the coordinates):
Location(%(address)s, (%(latitude)s, %(longitude)s, %(altitude)s))
What you see when you print data
uses repr
(pandas seems to drop the leading Location
type for brevity), so you can see the coordinates. But when the column is converted to str
, it uses str
representation, which doesn't include the coordinates. That's the whole magic here.