Search code examples
pythonpandasgeopy

Returning zipcodes for longitude/latitude with geopy - avoiding GeocoderTimedOut: ('Service timed out', 'occurred at index ...')


This question is not new and was discussed multiple times, but I am new to Python.

Geopy too slow - timeout all the time

Timeout error in Python geopy geocoder

I have a dataset of 11000 geolocations and would like to have their zipcodes.

My data looks like:

   Longitude   Latitude
0 -87.548627  41.728184
1 -87.737227  41.749111
2 -87.743974  41.924143
3 -87.659294  41.869314
4 -87.727808  41.877007

Using this question, I wrote a function, which works for the first 10-20 rows, but gives a timeout error.

# Create a function for zip codes extraction
def get_zipcode(df, geolocator, lat_field, lon_field):
   location = geolocator.reverse((df[lat_field], df[lon_field]))
   return location.raw['address']['postcode']

geolocator = geopy.Nominatim(user_agent = 'my-application')

# Test a sample with 20 rows
test = bus_stops_geo.head(20)

# Extract zip codes for the sample
zipcodes = test.apply(get_zipcode, axis = 1, geolocator = geolocator, 
                           lat_field = 'Latitude', lon_field = 'Longitude')

print(zipcodes)

0     60617
1     60652
2     60639
3     60607
4     60644
5     60659
6     60620
7     60626
8     60610
9     60660
10    60625
11    60645
12    60628
13    60620
14    60629
15    60628
16    60644
17    60638
18    60657
19    60631
dtype: object

I tried to change the timeout time, but failed so far.

My questions:

  • How to achieve this for 11000 rows?
  • How to rewrite this function and return not only zips, but initial long and lat too?
  • Any simple alternative solutions in programming languages like R or using proprietary software (paid options work for me)?

Tremendously appreciate any help!


Solution

  • Usage of geopy with pandas is described in docs: https://geopy.readthedocs.io/en/1.22.0/#usage-with-pandas

    Solution with geopy:

    In [1]: import pandas as pd
       ...:
       ...: df = pd.DataFrame([
       ...:     [-87.548627, 41.728184],
       ...:     [-87.737227, 41.749111],
       ...:     [-87.743974, 41.924143],
       ...:     [-87.659294, 41.869314],
       ...:     [-87.727808, 41.877007],
       ...: ], columns=["Longitude", "Latitude"])
    
    In [2]: from tqdm import tqdm
       ...: tqdm.pandas()
       ...:
       ...: from geopy.geocoders import Nominatim
       ...: geolocator = Nominatim(user_agent="specify_your_app_name_here")
       ...:
       ...: from geopy.extra.rate_limiter import RateLimiter
       ...: reverse = RateLimiter(geolocator.reverse, min_delay_seconds=1)
    
    In [3]: df['Location'] = df.progress_apply(
       ...:     lambda row: reverse((row['Latitude'], row['Longitude'])),
       ...:     axis=1
       ...: )
    100%|█████████████████████████████| 5/5 [00:06<00:00,  1.24s/it]
    
    In [4]: def parse_zipcode(location):
       ...:     if location and location.raw.get('address') and location.raw['address'].get('postcode'):
       ...:         return location.raw['address']['postcode']
       ...:     else:
       ...:         return None
       ...: df['Zipcode'] = df['Location'].apply(parse_zipcode)
    
    In [5]: df
    Out[5]:
       Longitude   Latitude                                           Location Zipcode
    0 -87.548627  41.728184  (Olive Harvey College South Chicago Learning C...   60617
    1 -87.737227  41.749111  (7900, South Kilpatrick Avenue, Chicago, Cook ...   60652
    2 -87.743974  41.924143  (4701, West Fullerton Avenue, Beat 2522, Belmo...   60639
    3 -87.659294  41.869314  (1301-1307, West Taylor Street, Near West Side...   60607
    4 -87.727808  41.877007  (4053, West Jackson Boulevard, West Garfield P...   60644
    

    If paid options work for you, consider using any other geocoding service than the free Nominatim, such as MapQuest or PickPoint.