So I was analyzing a data set with addresses in Philadelphia, PA. Now, in order to make use of these, I wanted to get the exact longitude and latitude to later show them on a map.
I have gotten the unique entries of the column as a list and have implemented a loop to get me the longitude and latitude, though it's giving me the same coordinates for every city and sometimes even ones that are outside of Philadelphia.
Here's what I did so far:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="my_user_agent")
geocode = lambda query: geolocator.geocode("%s, Philadelphia PA" % query)
cities = list(philly["station_name"].unique())
for city in cities:
address = city
location = geolocator.geocode(address)
if(location != None):
philly["longitude"] = location.longitude
philly["latitude"] = location.latitude
philly["coordinates"] = list(zip(philly["latitude"], philly["longitude"]))
If "philly" is a list of dictionary objects then you can iterate over the list and add the location properties to each record.
from geopy.geocoders import Nominatim
philly = [{'station_name': '30th Street Station'}]
geolocator = Nominatim(user_agent="my_user_agent")
for row in philly:
address = row["station_name"]
location = geolocator.geocode(f"{address}, Philadelphia, PA", country_codes="us")
if location:
print(address)
print(">>", location.longitude, location.latitude)
row["longitude"] = location.longitude
row["latitude"] = location.latitude
row["coordinates"] = (location.longitude, location.latitude)
print(philly)
Output:
30th Street Station
>> -75.1821442 39.9552836
[{'station_name': '30th Street Station', 'longitude': -75.1821442, 'latitude': 39.9552836, 'coordinates': (-75.1821442, 39.9552836)}]
If working with a Pandas dataframe then you can iterate over each record in the dataframe then set the latitude, longitude and coordinates fields in it.
You can do something like this:
from geopy.geocoders import Nominatim
import pandas as pd
geolocator = Nominatim(user_agent="my_user_agent")
philly = [{'station_name': '30th Street Station'}]
df = pd.DataFrame(philly)
# add empty location columns to data frame
df["latitude"] = ""
df["longitude"] = ""
df["coordinates"] = ""
for _, row in df.iterrows():
address = row.station_name
location = geolocator.geocode(f"{address}, Philadelphia, PA", country_codes="us")
if location:
row["latitude"] = location.latitude
row["longitude"] = location.longitude
row["coordinates"] = (location.longitude, location.latitude)
print(df)
Output:
station_name latitude longitude coordinates
0 30th Street Station 39.955284 -75.182144 (-75.1821442, 39.9552836)
If you have a list with duplicate station names then you should cache the results so you don't make duplicate geolocation requests.