I have two datasets in geoJSON format. Using geoPandas, on the one hand I have a dataset with coordinates of various points in a certain city.
Table gdf:
╔═══╦════════╦══════════╦══════════════════════╦═════════╦══════════╦══════════════════════════╗
║ ║ id ║ layer_id ║ title ║ lon ║ lat ║ geometry ║
╠═══╬════════╬══════════╬══════════════════════╬═════════╬══════════╬══════════════════════════╣
║ 0 ║ 83969 ║ 12 ║ Garces ║ 2.15351 ║ 41.37926 ║ POINT (2.15351 41.37926) ║
╠═══╬════════╬══════════╬══════════════════════╬═════════╬══════════╬══════════════════════════╣
║ 1 ║ 86258 ║ 146 ║ Ritsch ║ 2.16235 ║ 41.38429 ║ POINT (2.16235 41.38429) ║
╠═══╬════════╬══════════╬══════════════════════╬═════════╬══════════╬══════════════════════════╣
║ 2 ║ 83964 ║ 40 ║ Lunch & Catering Bar ║ 2.15368 ║ 41.37913 ║ POINT (2.15368 41.37913) ║
╠═══╬════════╬══════════╬══════════════════════╬═════════╬══════════╬══════════════════════════╣
║ 3 ║ 83970 ║ 8 ║ Galaxia ║ 2.15343 ║ 41.37932 ║ POINT (2.15343 41.37932) ║
╠═══╬════════╬══════════╬══════════════════════╬═════════╬══════════╬══════════════════════════╣
║ 4 ║ 74866 ║ 40 ║ Celler de l`Abi ║ 2.14207 ║ 41.3694 ║ POINT (2.14207 41.36941) ║
╚═══╩════════╩══════════╩══════════════════════╩═════════╩══════════╩══════════════════════════╝
On the other hand, I have another dataset where the polygons of different areas are located in the same city.
Table polys:
╔═══╦═══════╦═════╦═══════════════════════════════════════════════════╗
║ ║ Name ║ ... ║ geometry ║
╠═══╬═══════╬═════╬═══════════════════════════════════════════════════╣
║ 0 ║ aoi_1 ║ ... ║ POLYGON Z ((2.13049 41.38221 0.00000, 2.13101 ... ║
╠═══╬═══════╬═════╬═══════════════════════════════════════════════════╣
║ 1 ║ aoi_2 ║ ... ║ POLYGON Z ((2.14463 41.39321 0.00000, 2.14495 ... ║
╠═══╬═══════╬═════╬═══════════════════════════════════════════════════╣
║ 2 ║ aoi_3 ║ ... ║ POLYGON Z ((2.14592 41.39374 0.00000, 2.14613 ... ║
╠═══╬═══════╬═════╬═══════════════════════════════════════════════════╣
║ 3 ║ aoi_4 ║ ... ║ POLYGON Z ((2.14860 41.39433 0.00000, 2.14884 ... ║
╠═══╬═══════╬═════╬═══════════════════════════════════════════════════╣
║ 4 ║ aoi_5 ║ ... ║ POLYGON Z ((2.14845 41.39443 0.00000, 2.14873 ... ║
╚═══╩═══════╩═════╩═══════════════════════════════════════════════════╝
What I want to do is determine what points lie within each of the polygons and generate a new dataset. I would also like to know the most optimal way to do it because, I have around 100 polygons and around 100K points to verify in the polygons.
UsingShapely
and property contains
, what I am trying to do is the following::
inside = gdf[gdf.apply(lambda row: polys.contains(Point(row.lon, row.lat)), axis=1)]
The problem is that I don't want to get only the points within a single polygon (which the code above does) but all the points and know what polygons they are in.
I found what I was looking for, and it goes something like this:
points_inside = gpd.sjoin(gdf, polys[['Name', 'geometry']], op='within')