Search code examples
pythongeopandasshapely

How is this different behaviour possible? TypeError: unhashable type: 'Point'


I try to find and filter the Points in a GeoDataFrame (df1) which are close to Points in a second GDF (df2), and vise versa. I use this piece of code for it:

ps1 = []
ps2 = []
for p1 in df1.geometry:
    for p2 in df2.geometry:
        dist = haversine(p1.y,p1.x,p2.y,p2.x)
        if dist < 100:
            ps1.append(p1)
            ps2.append(p2)

df1 = df1[df1.geometry.isin(ps1)]
df2 = df2[df2.geometry.isin(ps2)]

However, I get an error on the last line: TypeError: unhashable type: 'Point'

But the line above it works like a charm, and the data types of both lines (df1/df2 and ps1/ps2) are exactly the same.

How is that possible? And how can it be solved?

EDIT:

types of variables:

df1         :  <class 'geopandas.geodataframe.GeoDataFrame'>
df1.geometry:  <class 'geopandas.geoseries.GeoSeries'>
ps1         :  <class 'list'>
val1        :  <class 'pandas.core.series.Series'>
df2         :  <class 'geopandas.geodataframe.GeoDataFrame'>
df2.geometry:  <class 'geopandas.geoseries.GeoSeries'>
ps2         :  <class 'list'>

EDIT 2:

df1.dtypes
Out[301]: 
lat                     float64
lon                     float64
time        datetime64[ns, UTC]
geometry               geometry
dtype: object

df2.dtypes
Out[302]: 
lat                     float64
lon                     float64
time        datetime64[ns, UTC]
geometry               geometry
dtype: object

MWE:

import pandas as pd
from pandas import Timestamp
import geopandas as gpd
import numpy as np

def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371000):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))

df1 = pd.DataFrame.from_dict({'lat': {0: 52.378851603519905,
  1: 52.37896949048437,
  2: 52.378654032960824,
  3: 52.37818902922923},
 'lon': {0: 4.88585622453752,
  1: 4.886671616078047,
  2: 4.886413945242339,
  3: 4.885995520636016},
 'time': {0: Timestamp('2019-11-05 11:31:42+0000', tz='UTC'),
  1: Timestamp('2019-11-05 11:32:22+0000', tz='UTC'),
  2: Timestamp('2019-11-05 11:32:49+0000', tz='UTC'),
  3: Timestamp('2019-11-05 11:33:31+0000', tz='UTC')}})
df2 = pd.DataFrame.from_dict({'lat': {0: 52.378851603519905,
  1: 52.369466977365214,
  2: 52.36923115238693,
  3: 52.36898222465506},
 'lon': {0: 4.88585622453752,
  1: 4.9121331184582,
  2: 4.912723204441477,
  3: 4.913505393878495},
 'time': {0: Timestamp('2019-11-05 08:54:32+0000', tz='UTC'),
  1: Timestamp('2019-11-05 08:55:06+0000', tz='UTC'),
  2: Timestamp('2019-11-05 08:55:40+0000', tz='UTC'),
  3: Timestamp('2019-11-05 08:56:22+0000', tz='UTC')}})

df1 = gpd.GeoDataFrame(df1, geometry=gpd.points_from_xy(df1.lat, df1.lon))
df2 = gpd.GeoDataFrame(df2, geometry=gpd.points_from_xy(df2.lat, df2.lon))

ps1 = []
ps2 = []
for p1 in df1.geometry:
    for p2 in df2.geometry:
        dist = haversine(p1.y,p1.x,p2.y,p2.x)
        if dist < 100:
            ps1.append(p1)
            ps2.append(p2)

val1 = gpd.GeoDataFrame(df1)
val2 = gpd.GeoDataFrame(df2)
# print(type(df1))
# print(type(df2))
# print(type(ps1))
# print(type(ps2))
print('df1         : ', type(df1))
print('df1.geometry: ', type(df1.geometry))
print('ps1         : ', type(ps1))
val1 = df1.geometry.isin(ps1)
print('val1        : ', type(val1))

print('df2         : ', type(df2))
print('df2.geometry: ', type(df2.geometry))
print('ps2         : ', type(ps2))
val2 = df2.geometry.isin(ps2)
print('val2        : ', type(val2))
# df1 = df1[df1.geometry.isin(ps1)]
# df2 = df2[df2.geometry.isin(ps2)]

Solution

  • As the error says, Point is not hashable (since this?).

    It turns out, for a reason I ignore, the pandas.Series.isin function seems to require the data to be hashable. See the question I just posted.

    As for your question, a workaround would be to use lists, and convert it again to Series, like:

    val2 = pd.Series([v in ps2 for v in df2.geometry])