Search code examples
pythonjupyter-notebookgeocodinggeopy

Why does geolocate not give me the right addresses?


So I was analyzing a data set with addresses in Philadelphia, PA. Now, in order to make use of these, I wanted to get the exact longitude and latitude to later show them on a map.

I have gotten the unique entries of the column as a list and have implemented a loop to get me the longitude and latitude, though it's giving me the same coordinates for every city and sometimes even ones that are outside of Philadelphia.

Here's what I did so far:

from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="my_user_agent")
geocode = lambda query: geolocator.geocode("%s, Philadelphia PA" % query)

cities = list(philly["station_name"].unique())
for city in cities:
    address = city
    location = geolocator.geocode(address)
    if(location != None):
        philly["longitude"] = location.longitude
        philly["latitude"] = location.latitude

philly["coordinates"] = list(zip(philly["latitude"], philly["longitude"]))

Solution

  • If "philly" is a list of dictionary objects then you can iterate over the list and add the location properties to each record.

    from geopy.geocoders import Nominatim
    
    philly = [{'station_name': '30th Street Station'}]
    
    geolocator = Nominatim(user_agent="my_user_agent")
    for row in philly:
        address = row["station_name"]
        location = geolocator.geocode(f"{address}, Philadelphia, PA", country_codes="us")
        if location:
            print(address)
            print(">>", location.longitude, location.latitude)
            row["longitude"] = location.longitude
            row["latitude"] = location.latitude
            row["coordinates"] = (location.longitude, location.latitude)
    print(philly)
    

    Output:

    30th Street Station
    >> -75.1821442 39.9552836
    [{'station_name': '30th Street Station', 'longitude': -75.1821442, 'latitude': 39.9552836, 'coordinates': (-75.1821442, 39.9552836)}]
    

    If working with a Pandas dataframe then you can iterate over each record in the dataframe then set the latitude, longitude and coordinates fields in it.

    You can do something like this:

    from geopy.geocoders import Nominatim
    import pandas as pd
    
    geolocator = Nominatim(user_agent="my_user_agent")
    
    philly = [{'station_name': '30th Street Station'}]
    df = pd.DataFrame(philly)
    
    # add empty location columns to data frame
    df["latitude"] = ""
    df["longitude"] = ""
    df["coordinates"] = ""
    
    for _, row in df.iterrows():
        address = row.station_name
        location = geolocator.geocode(f"{address}, Philadelphia, PA", country_codes="us")
        if location:
            row["latitude"] = location.latitude
            row["longitude"] = location.longitude
            row["coordinates"] = (location.longitude, location.latitude)
    print(df)
    

    Output:

              station_name   latitude  longitude                coordinates
    0  30th Street Station  39.955284 -75.182144  (-75.1821442, 39.9552836)
    

    If you have a list with duplicate station names then you should cache the results so you don't make duplicate geolocation requests.