I have a dataframe whose one of the columns has a Series of shapely Points and another one in which I have a Series of Polygons.
hash number street unit \
2024460 1a92a1c3cba7941a 485 AVENIDA DOUTOR SEVERIANO DE ALMEIDA NaN
2024461 837341c45de519a3 475 AVENIDA DOUTOR SEVERIANO DE ALMEIDA NaN
city district region postcode id geometry
2024459 Jaguari NaN RS 97760-000 NaN POINT (-54.69445 -29.49421)
2024460 Jaguari NaN RS 97760-000 NaN POINT (-54.69445 -29.49421)
2024461 Jaguari NaN RS 97760-000 NaN POINT (-54.69445 -29.49421)
centroids geometry
0 POINT (-29.31067315122428 -54.64176359828149) POLYGON ((-54.64069 -29.31161, -54.64069 -29.3...
1 POINT (-29.31067315122428 -54.63961783106958) POLYGON ((-54.63854 -29.31161, -54.63854 -29.3...
2 POINT (-29.31067315122428 -54.637472063857665) POLYGON ((-54.63640 -29.31161, -54.63640 -29.3...
I'm checking if the Point belongs to the Polygon and inserting the Point object into the cell of the second dataframe. However, I'm getting the following error:
Traceback (most recent call last):
File "/tmp/ipykernel_4771/1967309101.py", line 1, in <module>
df.loc[idx, 'centroids'] = poly_mun.loc[ix, 'centroids']
File ".local/lib/python3.8/site-packages/pandas/core/indexing.py", line 692, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File ".local/lib/python3.8/site-packages/pandas/core/indexing.py", line 1599, in _setitem_with_indexer
self.obj[key] = infer_fill_value(value)
File ".local/lib/python3.8/site-packages/pandas/core/dtypes/missing.py", line 516, in infer_fill_value
val = np.array(val, copy=False)
TypeError: float() argument must be a string or a number, not 'Point'
I'm using the following command line:
df.loc[idx, 'centroids'] = poly_df.loc[ix, 'centroids']
I have already tried at
as well.
You can't create a new column in pandas with a shapely geometry using loc:
In [1]: import pandas as pd, shapely.geometry
In [2]: df = pd.DataFrame({'mycol': [1, 2, 3]})
In [3]: df.loc[0, "centroid"] = shapely.geometry.Point([0, 0])
/Users/mikedelgado/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/indexing.py:1642: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
self.obj[key] = infer_fill_value(value)
/Users/mikedelgado/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/dtypes/missing.py:550: FutureWarning: The input object of type 'Point' is an array-like implementing one of the corresponding protocols (`__array__`, `__array_interface__` or `__array_struct__`); but not a sequence (or 0-D). In the future, this object will be coerced as if it was first converted using `np.array(obj)`. To retain the old behaviour, you have to either modify the type 'Point', or assign to an empty array created with `np.empty(correct_shape, dtype=object)`.
val = np.array(val, copy=False)
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 df.loc[0, "centroid"] = shapely.geometry.Point([0, 0])
File ~/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/indexing.py:716, in _LocationIndexer.__setitem__(self, key, value)
713 self._has_valid_setitem_indexer(key)
715 iloc = self if self.name == "iloc" else self.obj.iloc
--> 716 iloc._setitem_with_indexer(indexer, value, self.name)
File ~/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/indexing.py:1642, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)
1639 self.obj[key] = empty_value
1641 else:
-> 1642 self.obj[key] = infer_fill_value(value)
1644 new_indexer = convert_from_missing_indexer_tuple(
1645 indexer, self.obj.axes
1646 )
1647 self._setitem_with_indexer(new_indexer, value, name)
File ~/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/dtypes/missing.py:550, in infer_fill_value(val)
548 if not is_list_like(val):
549 val = [val]
--> 550 val = np.array(val, copy=False)
551 if needs_i8_conversion(val.dtype):
552 return np.array("NaT", dtype=val.dtype)
TypeError: float() argument must be a string or a real number, not 'Point'
Essentially, pandas doesn't know how to interpret a point object, and so creates a float column with NaNs, and then can't handle the point. This might get fixed in the future, but you're best off explicitly defining the column as object dtype:
In [27]: df['centroid'] = None
In [28]: df['centroid'] = df['centroid'].astype(object)
In [29]: df
mycol centroid
0 1 None
1 2 None
2 3 None
In [30]: df.loc[0, "centroid"] = shapely.geometry.Point([0, 0])
/Users/mikedelgado/opt/miniconda3/envs/rhodium-env/lib/python3.10/site-packages/pandas/core/internals/managers.py:304: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
applied = getattr(b, f)(**kwargs)
In [31]: df
mycol centroid
0 1 POINT (0 0)
1 2 None
2 3 None
That said, joining two GeoDataFrames with polygons and points based on whether the points are in the polygons certainly sounds like a job for geopandas.sjoin
union = gpd.sjoin(polygon_df, points_df, op='contains')