Search code examples
pythonpython-3.xgeopandas

How to get Geopandas to Focus only on USA Zip Codes?


I'm running the code below and getting some non-USA 'dropoff_location', 'dropoff_lat', and 'dropoff_lon' for USA zip codes. All zip codes are in the New York City area so all 'dropoff_location', 'dropoff_lat', and 'dropoff_lon' should be in the New York City area. Am I doing something wrong here?

import geopandas
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="ryan_app")
 
#applying the rate limiter wrapper
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode)
 
#Applying the method to pandas DataFrame
df['dropoff_location'] = df['dropoff_zip'].apply(geocode)
df['dropoff_lat'] = df['dropoff_location'].apply(lambda x: x.latitude if x else None)
df['dropoff_lon'] = df['dropoff_location'].apply(lambda x: x.longitude if x else None)
 
df.head()

Result:

pickup_datetime dropoff_datetime    trip_distance   fare_amount pickup_zip  dropoff_zip time_of_trip    dropoff_location    dropoff_lat dropoff_lon
95  2016-02-02 14:00:28 2016-02-02 14:20:22 2.04    13.5    10001   10199   0 days 00:19:54 (Manhattan, New York County, City of New York,...   40.751528   -73.995849
96  2016-02-10 00:25:33 2016-02-10 00:30:09 1.03    5.5 10001   10011   0 days 00:04:36 (Manhattan, New York County, City of New York,...   40.740972   -73.999560
97  2016-02-19 09:19:18 2016-02-19 09:34:41 2.10    11.5    10002   10001   0 days 00:15:23 (Корольовський район, Житомир, Житомирська міс...   50.269960   28.702845
98  2016-02-12 21:14:59 2016-02-12 21:22:33 0.93    6.5 10011   10012   0 days 00:07:34 (Bechloul, Daïra Bechloul, Bouira, 10012, Algé...   36.312195   4.074957
99  2016-02-04 21:25:09 2016-02-04 21:35:38 1.70    9.0 10028   10065   0 days 00:10:29 (San Germano Chisone, Torino, Piemonte, 10065,...   44.894901   7.235602

Solution

  • Solution 1:

    You just need to limit search results to a specific country (or a list of countries) by putting the country_codes argument in the geolocator.geocode method. Your code would look like this below:

    import geopandas
    from geopy.geocoders import Nominatim
    geolocator = Nominatim(user_agent="ryan_app")
     
    
    df['dropoff_location'] = df['dropoff_zip'].apply(geolocator.geocode, country_codes="US", timeout=1)
    df['dropoff_lat'] = df['dropoff_location'].apply(lambda x: x.latitude if x else None)
    df['dropoff_lon'] = df['dropoff_location'].apply(lambda x: x.longitude if x else None)
    
    print(df)
    

    Output:

       pickup_zip  dropoff_zip                                   dropoff_location  dropoff_lat  dropoff_lon
    0       10001        10199  (Manhattan, New York County, City of New York,...    40.751528   -73.995849
    1       10001        10011  (Manhattan, New York County, City of New York,...    40.740858   -73.999422
    2       10002        10001  (Manhattan, New York County, City of New York,...    40.748399   -73.994036
    3       10011        10012  (Manhattan, New York County, City of New York,...    40.725028   -73.998068
    4       10028        10065  (Manhattan, New York County, City of New York,...    40.766035   -73.964690
    

    Solution 2

    You can also get a detailed address once you've extracted the latitude and longitude from the zipcodes. Another solution to get a more detailed address would be like this below,:

    import numpy as np
    import geopy
    geolocator = geopy.geocoders.Nominatim(user_agent="ryan_app")
    
    def reverse_geocoding(lat, lon):
        try:
            location = geolocator.reverse(geopy.point.Point(lat, lon))
            return location.raw['display_name']
        except:
            return None
        
    df['dropoff_location'] = df['dropoff_zip'].apply(geolocator.geocode, country_codes="US", timeout=1)
    df['dropoff_lat'] = df['dropoff_location'].apply(lambda x: x.latitude if x else None)
    df['dropoff_lon'] = df['dropoff_location'].apply(lambda x: x.longitude if x else None)
    df['detailed_dropoff_address'] = np.vectorize(reverse_geocoding)(df['dropoff_lat'], df['dropoff_lon'])
    
    print(df.head())
    

    Output:

       pickup_zip  dropoff_zip                                   dropoff_location  dropoff_lat  dropoff_lon                           detailed_dropoff_address
    0       10001        10199  (Manhattan, New York County, City of New York,...    40.751528   -73.995849  Moynihan Train Hall, West 31st Street, Chelsea...
    1       10001        10011  (Manhattan, New York County, City of New York,...    40.740858   -73.999422  224, West 17th Street, Chelsea District, Manha...
    2       10002        10001  (Manhattan, New York County, City of New York,...    40.748399   -73.994036  227, West 29th Street, Chelsea, Manhattan, New...
    3       10011        10012  (Manhattan, New York County, City of New York,...    40.725028   -73.998068  Self-Portrait, 158, Mercer Street, Manhattan C...
    4       10028        10065  (Manhattan, New York County, City of New York,...    40.766035   -73.964690  Church of St. Vincent Ferrer, East 66th Street...