I have a large geodataframe (potentially millions of rows) which consists of LineString geometries. Some of these lines could intersect each other, whereas others don't intersect. I would like to remove all lines that intersect in an efficient manner. I have a solution using "apply" and a shapely function to find intersections, but it is very slow on large datasets. Any help would be much appreciated!
My solution so far with a small example geodataframe is below. This works but is slow when scaled up.
from shapely.geometry import LineString
import geopandas as gpd
import numpy as np
def doesIntersect(line_A, line_B):
return line_A.intersects(line_B)
def check_intersections(row, lines):
# remove this row from the lines geoseries so we don't check against itself
l = lines.drop(row.name)
# return a boolean array to check if lines intersect
intersects = [ doesIntersect(row['geometry'], x) for x in l ]
# get the index of the first intersecting line
idx = np.where(intersects)[0]
# add the geometry of the intersecting line to the original geodataframe
if not (len(idx) == 0):
intersect_line = l.iloc[idx[0]]
row['intersect_line'] = intersect_line
print(intersect_line)
else:
row['intersect_line'] = 0
print("no intersection")
return row
# create an example gdf of intersecting lines
gdf = gpd.GeoDataFrame({'id': [1, 2, 3]}, geometry=[LineString([(1, 1), (4, 4)]),
LineString([(1, 4), (4, 1)]),
LineString([(6, 1), (6, 6)])])
# get the lines as a geoseries
lines = gdf['geometry']
# apply the function to the geodataframe
gdf = gdf.apply(check_intersections, axis=1, lines=lines)
# now drop intersecting lines
gdf = gdf[gdf['intersect_line'] == 0].drop(columns='intersect_line')
print(gdf)
How about a spatial join that will utilize a built-in R-tree spatial index?
import geopandas as gpd
import pandas as pd
gdf = gpd.GeoDataFrame({'id': [1, 2, 3]}, geometry=[LineString([(1, 1), (4, 4)]),
LineString([(1, 4), (4, 1)]),
LineString([(6, 1), (6, 6)])])
# make a copy of the original gdf for spatial join
gdf_copy = gdf.copy(deep=True)
# spaital join the copy to the original gdf
gdf_sj = gpd.sjoin(gdf, gdf_copy)
gdf_sj_vc = gdf_sj['id_left'].value_counts()
# value counts for intersecting lines will be >=2
int_mask = gdf_sj_vc.ge(2)
# add the mask to gdf_sj as intersects columnn
gdf_sj['intersects'] = gdf_sj['id_left'].map(int_mask)
# drop non-intersecting lines
gdf_no_intersect = gdf_sj.loc[gdf_sj['intersects'] == False]
print(gdf_no_intersect)
id_left geometry index_right id_right intersects
2 3 LINESTRING (6.00000 1.00000, 6.00000 6.00000) 2 3 False