Search code examples
pandastuplesgeopandasgeopy

Overwrite a single conditionally selected cell in a (geo)pandas (geo)dataframe with external data in tuple or geopy.location.Location format


I start with a pandas dataframe of historical places. I pass a column of place names to geopy for geocoding. I extract the coordinates, and turn them into points. I also save the geocoder's returned geopy.location.Location in a column for further use. This all seems to work fine if I run it on the whole (geo)dataframe.

The problem arises when I want to overwrite a few of the entries. For instance, I want to overwrite wherever the geocoder has tried and failed to locate 'Fargo N Dak' (the historical abbreviation) correctly. I can re-run the geocoder on a single, modernized place name and extract the data, but I can't figure out how to insert it in the original gdf.

import numpy as np
import pandas as pd
import geopandas as gpd

from geopy.extra.rate_limiter import RateLimiter
from geopy.geocoders import Nominatim

#set up Nominatim geocoder with geopy package with a user_agent (required) and a rate limiter
Ngeocoder0 = Nominatim(user_agent="aGeocoder")
Ngeocoder = RateLimiter(Ngeocoder0.geocode, min_delay_seconds=1)

#make some toy data
lst = ['Minneapolis MN', 'Fargo ND', 'Fargo N Dak'] 
df = pd.DataFrame(lst,columns =['rawPOB'])

df['_TEMP'] = df['rawPOB'].apply(lambda x: Ngeocoder(x, language='en',addressdetails=True))

df['rawGCcoords']=df['_TEMP'].apply(lambda x: (x.point[1], x.point[0]) if x else None)
df['rawGClong']=df['_TEMP'].apply(lambda x: (x.point[1]) if x else None)
df['rawGClat']=df['_TEMP'].apply(lambda x: (x.point[0]) if x else None)

gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['rawGClong'], df['rawGClat']))


#so far so good, except Fargo N Dak is not recognized as Fargo North Dakota
#code just this place...
manualentry= Ngeocoder('Fargo North Dakota', language='en',addressdetails=True)
manualcoords=(manualentry.point[1], manualentry.point[0])

#... but here it all goes wrong when try to insert
gdf.loc[gdf.rawPOB == 'Fargo N Dak', '_TEMP'] = manualentry
gdf.loc[gdf.rawPOB == 'Fargo N Dak', 'rawGCcoords'] = manualcoords

#trying .at instead, and checking data type as in https://stackoverflow.com/questions/27949671/add-a-tuple-to-a-specific-cell-of-a-pandas-dataframe
gdf.dtypes
gdf.at[gdf.rawPOB == 'Fargo N Dak', 'rawGCcoords'] = manualcoords

# :(

What am I missing?

Thank you.

P.S. In this example I could of course clean up the data before I try to geocode it, but I am trying to build a workflow in which I could use the mapped, first geocoding results to look for less obvious mistakes.


Solution

  • I guess it's due to how you are indexing your GeoDataFrame. When you are doing

    gdf.loc[gdf.rawPOB == 'Fargo N Dak', '_TEMP'] you get a pandas.Series in return (because you are using the boolean indexing with gdf.rawPOB == '..'). So you can't do the assignement your are trying (and should get an error like ValueError: Must have equal len keys and value when setting with an iterable which isn't that helpful).

    What I suggest is that you reindex your GeoDataFrame using your rawPOB column, then you will be able to easily set/get a value for a specific pair of (index, column name) using the DataFrame.at method like so :

    gdf.set_index('rawPOB', inplace=True)
    
    gdf.at['Fargo N Dak', '_TEMP'] = manualentry
    gdf.at['Fargo N Dak', 'rawGCcoords'] = manualcoords
    

    Once you are done, you can reset the index as previously if needed :

    gdf.reset_index(inplace=True)