0I have two data sets with point data in 'lat' and 'lon' columns. A set of points of interest were mapped by two different people at two different points in time and I want to check their consistency. Thus, many of the points in the first data set correspond (approximately within a few meters) to a point in the second data set. Visually for a small sample that's easy enough to do by plotting them on top of each other, but with several hundred points, I need a spatial merge. I found the sjoin_nearest
method in geopandas.
At first, I was using geopandas 0.9 and sjoin_nearest
was added in 0.10, so I updated using Conda:
The following packages will be UPDATED:
ca-certificates 2023.05.30-hecd8cb5_0 --> 2024.3.11-hecd8cb5_0
certifi 2023.5.7-py39hecd8cb5_0 --> 2024.6.2-py39hecd8cb5_0
geopandas conda-forge/noarch::geopandas-0.9.0-p~ --> pkgs/main/osx-64::geopandas-0.12.2-py39hecd8cb5_0
geopandas-base conda-forge/noarch::geopandas-base-0.~ --> pkgs/main/osx-64::geopandas-base-0.12.2-py39hecd8cb5_0
openssl 1.1.1u-hca72f7f_0 --> 1.1.1w-hca72f7f_0
Now, I'm getting the following error from my code:
import pandas as pd
import geopandas as gpd
df1 = pd.read_stata('df1.dta')
df1 = gpd.GeoDataFrame(df1, geometry = [Point(xy) for xy in zip(df1['lon'], df1['lat'])], crs="EPSG:4326")
df2 = pd.read_stata('df2.dta')
df2 = gpd.GeoDataFrame(df2, geometry = [Point(xy) for xy in zip(df2['lon'], df2['lat'])], crs="EPSG:4326")
df = df1.sjoin_nearest(df2,how='inner',max_distance=0.01,distance_col='dist')
Traceback (most recent call last):
File "<ipython-input-15-d6ba3cf8492f>", line 6, in <module>
df = df1.sjoin_nearest(df2,how='inner',max_distance=0.01,distance_col='dist')
File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/geodataframe.py", line 2173, in sjoin_nearest
return geopandas.sjoin_nearest(
File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 516, in sjoin_nearest
_basic_checks(left_df, right_df, how, lsuffix, rsuffix)
File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 150, in _basic_checks
raise ValueError(
ValueError: 'left_df' should be GeoDataFrame, got <class 'geopandas.geodataframe.GeoDataFrame'>
I saw similarly titled questions here and here, but in both cases, what users put into the routine were not GeoDataFrames (and the ValueError correctly pointed that out). In my case, both of them are GeoDataFrames. The ValueError seems to contradict itself. Using things like type(df1)
and type(df2)
confirms that both are GeoDataFrames and routine GeoDataFrame tasks like df1.plot()
work just fine. I also checked the sjoin_nearest()
documentation for any deprecation warnings, but came up empty-handed. Any leads appreciated.
Edit: MRE (from the sjoin_nearest
documentation)
import geopandas as gpd
import geodatasets
groceries = gpd.read_file(
geodatasets.get_path("geoda.groceries")
)
chicago = gpd.read_file(
geodatasets.get_path("geoda.chicago_health")
).to_crs(groceries.crs)
groceries_w_communities = gpd.sjoin_nearest(groceries, chicago)
Traceback (most recent call last):
File "<ipython-input-25-75e25adffbfd>", line 9, in <module>
groceries_w_communities = gpd.sjoin_nearest(groceries, chicago)
File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 516, in sjoin_nearest
_basic_checks(left_df, right_df, how, lsuffix, rsuffix)
File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 150, in _basic_checks
raise ValueError(
ValueError: 'left_df' should be GeoDataFrame, got <class 'geopandas.geodataframe.GeoDataFrame'>
This appears to have been a conda update
hick-up. I did not restart Spyder after the installation as the sjoin_nearest
command was being recognized after the update. Closing Spyder, deactivating the environment, activating it again, restarting Spyder, and running the identical code worked.