I start with a pandas dataframe of historical places. I pass a column of place names to geopy for geocoding. I extract the coordinates, and turn them into points. I also save the geocoder's returned geopy.location.Location in a column for further use. This all seems to work fine if I run it on the whole (geo)dataframe.
The problem arises when I want to overwrite a few of the entries. For instance, I want to overwrite wherever the geocoder has tried and failed to locate 'Fargo N Dak' (the historical abbreviation) correctly. I can re-run the geocoder on a single, modernized place name and extract the data, but I can't figure out how to insert it in the original gdf.
import numpy as np
import pandas as pd
import geopandas as gpd
from geopy.extra.rate_limiter import RateLimiter
from geopy.geocoders import Nominatim
#set up Nominatim geocoder with geopy package with a user_agent (required) and a rate limiter
Ngeocoder0 = Nominatim(user_agent="aGeocoder")
Ngeocoder = RateLimiter(Ngeocoder0.geocode, min_delay_seconds=1)
#make some toy data
lst = ['Minneapolis MN', 'Fargo ND', 'Fargo N Dak']
df = pd.DataFrame(lst,columns =['rawPOB'])
df['_TEMP'] = df['rawPOB'].apply(lambda x: Ngeocoder(x, language='en',addressdetails=True))
df['rawGCcoords']=df['_TEMP'].apply(lambda x: (x.point[1], x.point[0]) if x else None)
df['rawGClong']=df['_TEMP'].apply(lambda x: (x.point[1]) if x else None)
df['rawGClat']=df['_TEMP'].apply(lambda x: (x.point[0]) if x else None)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['rawGClong'], df['rawGClat']))
#so far so good, except Fargo N Dak is not recognized as Fargo North Dakota
#code just this place...
manualentry= Ngeocoder('Fargo North Dakota', language='en',addressdetails=True)
manualcoords=(manualentry.point[1], manualentry.point[0])
#... but here it all goes wrong when try to insert
gdf.loc[gdf.rawPOB == 'Fargo N Dak', '_TEMP'] = manualentry
gdf.loc[gdf.rawPOB == 'Fargo N Dak', 'rawGCcoords'] = manualcoords
#trying .at instead, and checking data type as in https://stackoverflow.com/questions/27949671/add-a-tuple-to-a-specific-cell-of-a-pandas-dataframe
gdf.dtypes
gdf.at[gdf.rawPOB == 'Fargo N Dak', 'rawGCcoords'] = manualcoords
# :(
What am I missing?
Thank you.
P.S. In this example I could of course clean up the data before I try to geocode it, but I am trying to build a workflow in which I could use the mapped, first geocoding results to look for less obvious mistakes.
I guess it's due to how you are indexing your GeoDataFrame. When you are doing
gdf.loc[gdf.rawPOB == 'Fargo N Dak', '_TEMP']
you get a pandas.Series in return (because you are using the boolean indexing with gdf.rawPOB == '..'
). So you can't do the assignement your are trying (and should get an error like ValueError: Must have equal len keys and value when setting with an iterable
which isn't that helpful).
What I suggest is that you reindex your GeoDataFrame using your rawPOB
column, then you will be able to easily set/get a value for a specific pair of (index, column name) using the DataFrame.at
method like so :
gdf.set_index('rawPOB', inplace=True)
gdf.at['Fargo N Dak', '_TEMP'] = manualentry
gdf.at['Fargo N Dak', 'rawGCcoords'] = manualcoords
Once you are done, you can reset the index as previously if needed :
gdf.reset_index(inplace=True)