Search code examples
pandasgoogle-maps-api-3

Using Google Maps API to find street intersection latitude and longitude coordinates based on a list in Pandas dataframe


I am working through my first data project have a dataframe showing street intersections that looks like this:

        from_station_name           count
86  Canal St & Adams St         50575
152 Clinton St & Madison St         5990
157 Clinton St & Washington Blvd    45378
159 Columbus Dr & Randolph St   31370
252 Franklin St & Monroe St         30832
321 Kingsbury St & Kinzie St    30654
175 Daley Center Plaza          30423
89  Canal St & Madison St           27138
410 Michigan Ave & Washington St    25468
330 LaSalle St & Jackson Blvd   23021

I'm trying to find the latitude and longitude coordinates of the intersections and have that information placed in two additional columns of this dataframe.

I tried to write a loop using the Google Maps API like this:

def geocode_intersection(api_key, intersection):
    base_url = "https://maps.googleapis.com/maps/api/geocode/json?"
    params = {"address": intersection, "key": api_key}
    response = requests.get(base_url, params=params)
    if response.status_code == 200:
        data = json.loads(response.text)
        if data["status"] == "OK":
            location = data["results"][0]["geometry"]["location"]
            return location["lat"], location["lng"]
        else:
            return None
    else:
        return None


api_key = "MY_API_KEY"

# Initialize new columns
busiest_stations_subs["latitude"] = np.nan
busiest_stations_subs["longitude"] = np.nan

# Loop through the DataFrame
for index, row in busiest_stations_subs.iterrows():
    intersection = row["from_station_name"]
    result = geocode_intersection(api_key, intersection)
    if result is not None:
        latitude, longitude = result
        busiest_stations_subs.at[index, "latitude"] = latitude
        busiest_stations_subs.at[index, "longitude"] = longitude

And got many intersections correctly, but many are NaN like so:

580            Wabash Ave & Roosevelt Rd  15914  41.861202  -87.943284
544                 State St & Kinzie St  15764        NaN         NaN
191               Damen Ave & Pierce Ave  15546  41.909363  -87.677420
339           Lake Shore Dr & North Blvd  15520  28.107321  -82.729723
235             Fairbanks Ct & Grand Ave  15477  64.840051 -147.719976
424                  Morgan St & Lake St  15307        NaN         NaN
499        Sheffield Ave & Fullerton Ave  15129  41.925373  -87.653611
582            Wacker Dr & Washington St  15010        NaN         NaN
619             Wilton Ave & Belmont Ave  14990  41.939913  -87.652890
122              Clark St & Armitage Ave  14881  41.918477  -87.636128
559              Streeter Dr & Grand Ave  14879        NaN         NaN
329             LaSalle St & Illinois St  14412  41.890819  -87.632670
197          Dearborn Pkwy & Delaware Pl  14144  41.898748  -87.629835

What I learned was that the intersections with NaN have street names that are missing the N, E, W, S directional indicators of the actual street. I'm able to look on Google Maps and see the the missing directional indicators, however there are hundreds of thousands of rows in my data. Is there a way to automate looking at all the NaN and finding what the actual street names are and replacing the names in the dataframe? One issue was that a street like "Lake St" in Chicago has two versions, a "W Lake St" and an "E Lake St". I'm not exactly sure how to get around this problem. Any help or suggestions on how to approach this solution would be appreciated.


Solution

  • The addresses are ambiguous that the Geocoder could not return any results

    The Geocoding Addresses Best Practices documentation says that:

    In general, use the Geocoding API when geocoding complete addresses (for example, “48 Pirrama Rd, Pyrmont, NSW, Australia”). Use the Places API Place Autocomplete service when geocoding ambiguous (incomplete)

    In your case, these addresses are incomplete and therefore the Geocoding API is working as intended. I tried geocoding the addresses that returned NaN and confirmed that it is indeed returning ZERO_RESULTS.

    You could try either ways:

    1. Improve search quality by querying non-ambiguous addresses

      • If you have no automated way of doing this, then this would be a rigorous task (as you have mentioned in your question).
      • For example, changing the address State St & Kinzie St by adding Chicago at the end (State St & Kinzie St Chicago) will return a result: 41.889261,-87.627984 instead of ZERO_RESULTS.
    2. Use Places API Place Autocomplete as mentioned by the documentation

      • With this one, the cost of your requests can increase given that you are running a for loop and autocomplete could cost you more than geocoding (depending on how you implement it).
      • But with this one, it is possible for the Autocomplete to return multiple results and you still need a checker if it is the correct one or not.

    As it is still considered as best practice to geocode only unambiguous addresses, what I could only think of as a solution for now is to just clean-up your data.

    With that said, I still hope this information helps!