This question is not new and was discussed multiple times, but I am new to Python.
Geopy too slow - timeout all the time
Timeout error in Python geopy geocoder
I have a dataset of 11000 geolocations and would like to have their zipcodes.
My data looks like:
Longitude Latitude
0 -87.548627 41.728184
1 -87.737227 41.749111
2 -87.743974 41.924143
3 -87.659294 41.869314
4 -87.727808 41.877007
Using this question, I wrote a function, which works for the first 10-20 rows, but gives a timeout error.
# Create a function for zip codes extraction
def get_zipcode(df, geolocator, lat_field, lon_field):
location = geolocator.reverse((df[lat_field], df[lon_field]))
return location.raw['address']['postcode']
geolocator = geopy.Nominatim(user_agent = 'my-application')
# Test a sample with 20 rows
test = bus_stops_geo.head(20)
# Extract zip codes for the sample
zipcodes = test.apply(get_zipcode, axis = 1, geolocator = geolocator,
lat_field = 'Latitude', lon_field = 'Longitude')
print(zipcodes)
0 60617
1 60652
2 60639
3 60607
4 60644
5 60659
6 60620
7 60626
8 60610
9 60660
10 60625
11 60645
12 60628
13 60620
14 60629
15 60628
16 60644
17 60638
18 60657
19 60631
dtype: object
I tried to change the timeout time, but failed so far.
My questions:
Tremendously appreciate any help!
Usage of geopy with pandas is described in docs: https://geopy.readthedocs.io/en/1.22.0/#usage-with-pandas
Solution with geopy:
In [1]: import pandas as pd
...:
...: df = pd.DataFrame([
...: [-87.548627, 41.728184],
...: [-87.737227, 41.749111],
...: [-87.743974, 41.924143],
...: [-87.659294, 41.869314],
...: [-87.727808, 41.877007],
...: ], columns=["Longitude", "Latitude"])
In [2]: from tqdm import tqdm
...: tqdm.pandas()
...:
...: from geopy.geocoders import Nominatim
...: geolocator = Nominatim(user_agent="specify_your_app_name_here")
...:
...: from geopy.extra.rate_limiter import RateLimiter
...: reverse = RateLimiter(geolocator.reverse, min_delay_seconds=1)
In [3]: df['Location'] = df.progress_apply(
...: lambda row: reverse((row['Latitude'], row['Longitude'])),
...: axis=1
...: )
100%|█████████████████████████████| 5/5 [00:06<00:00, 1.24s/it]
In [4]: def parse_zipcode(location):
...: if location and location.raw.get('address') and location.raw['address'].get('postcode'):
...: return location.raw['address']['postcode']
...: else:
...: return None
...: df['Zipcode'] = df['Location'].apply(parse_zipcode)
In [5]: df
Out[5]:
Longitude Latitude Location Zipcode
0 -87.548627 41.728184 (Olive Harvey College South Chicago Learning C... 60617
1 -87.737227 41.749111 (7900, South Kilpatrick Avenue, Chicago, Cook ... 60652
2 -87.743974 41.924143 (4701, West Fullerton Avenue, Beat 2522, Belmo... 60639
3 -87.659294 41.869314 (1301-1307, West Taylor Street, Near West Side... 60607
4 -87.727808 41.877007 (4053, West Jackson Boulevard, West Garfield P... 60644
If paid options work for you, consider using any other geocoding service than the free Nominatim, such as MapQuest or PickPoint.