Search code examples
geopandas

Contradictory (to me) ValueError message in geopandas `sjoin` and `sjoin_nearest`: GeoDataFrame demanded although GeoDataFrame supplied


0I have two data sets with point data in 'lat' and 'lon' columns. A set of points of interest were mapped by two different people at two different points in time and I want to check their consistency. Thus, many of the points in the first data set correspond (approximately within a few meters) to a point in the second data set. Visually for a small sample that's easy enough to do by plotting them on top of each other, but with several hundred points, I need a spatial merge. I found the sjoin_nearest method in geopandas.

At first, I was using geopandas 0.9 and sjoin_nearest was added in 0.10, so I updated using Conda:

The following packages will be UPDATED:

  ca-certificates                     2023.05.30-hecd8cb5_0 --> 2024.3.11-hecd8cb5_0 
  certifi                           2023.5.7-py39hecd8cb5_0 --> 2024.6.2-py39hecd8cb5_0 
  geopandas          conda-forge/noarch::geopandas-0.9.0-p~ --> pkgs/main/osx-64::geopandas-0.12.2-py39hecd8cb5_0 
  geopandas-base     conda-forge/noarch::geopandas-base-0.~ --> pkgs/main/osx-64::geopandas-base-0.12.2-py39hecd8cb5_0 
  openssl                                 1.1.1u-hca72f7f_0 --> 1.1.1w-hca72f7f_0 

Now, I'm getting the following error from my code:

import pandas as pd
import geopandas as gpd
df1 = pd.read_stata('df1.dta')
df1 = gpd.GeoDataFrame(df1, geometry = [Point(xy) for xy in zip(df1['lon'], df1['lat'])], crs="EPSG:4326")
df2 = pd.read_stata('df2.dta')
df2 = gpd.GeoDataFrame(df2, geometry = [Point(xy) for xy in zip(df2['lon'], df2['lat'])], crs="EPSG:4326")
df = df1.sjoin_nearest(df2,how='inner',max_distance=0.01,distance_col='dist')

Traceback (most recent call last):

  File "<ipython-input-15-d6ba3cf8492f>", line 6, in <module>
    df = df1.sjoin_nearest(df2,how='inner',max_distance=0.01,distance_col='dist')

  File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/geodataframe.py", line 2173, in sjoin_nearest
    return geopandas.sjoin_nearest(

  File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 516, in sjoin_nearest
    _basic_checks(left_df, right_df, how, lsuffix, rsuffix)

  File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 150, in _basic_checks
    raise ValueError(

ValueError: 'left_df' should be GeoDataFrame, got <class 'geopandas.geodataframe.GeoDataFrame'>

I saw similarly titled questions here and here, but in both cases, what users put into the routine were not GeoDataFrames (and the ValueError correctly pointed that out). In my case, both of them are GeoDataFrames. The ValueError seems to contradict itself. Using things like type(df1) and type(df2) confirms that both are GeoDataFrames and routine GeoDataFrame tasks like df1.plot() work just fine. I also checked the sjoin_nearest() documentation for any deprecation warnings, but came up empty-handed. Any leads appreciated.

Edit: MRE (from the sjoin_nearest documentation)

import geopandas as gpd
import geodatasets
groceries = gpd.read_file(
    geodatasets.get_path("geoda.groceries")
)
chicago = gpd.read_file(
    geodatasets.get_path("geoda.chicago_health")
).to_crs(groceries.crs)
groceries_w_communities = gpd.sjoin_nearest(groceries, chicago)

Traceback (most recent call last):

  File "<ipython-input-25-75e25adffbfd>", line 9, in <module>
    groceries_w_communities = gpd.sjoin_nearest(groceries, chicago)

  File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 516, in sjoin_nearest
    _basic_checks(left_df, right_df, how, lsuffix, rsuffix)

  File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 150, in _basic_checks
    raise ValueError(

ValueError: 'left_df' should be GeoDataFrame, got <class 'geopandas.geodataframe.GeoDataFrame'>

Solution

  • This appears to have been a conda update hick-up. I did not restart Spyder after the installation as the sjoin_nearest command was being recognized after the update. Closing Spyder, deactivating the environment, activating it again, restarting Spyder, and running the identical code worked.