Search code examples
pythongeopandasshapely

Removing intersecting lines in geopandas


I have a large geodataframe (potentially millions of rows) which consists of LineString geometries. Some of these lines could intersect each other, whereas others don't intersect. I would like to remove all lines that intersect in an efficient manner. I have a solution using "apply" and a shapely function to find intersections, but it is very slow on large datasets. Any help would be much appreciated!

My solution so far with a small example geodataframe is below. This works but is slow when scaled up.


from shapely.geometry import LineString
import geopandas as gpd
import numpy as np

def doesIntersect(line_A, line_B):
    return line_A.intersects(line_B)

def check_intersections(row, lines):
    # remove this row from the lines geoseries so we don't check against itself
    l = lines.drop(row.name)
    # return a boolean array to check if lines intersect
    intersects = [ doesIntersect(row['geometry'], x) for x in l ]

    # get the index of the first intersecting line
    idx = np.where(intersects)[0]

    # add the geometry of the intersecting line to the original geodataframe
    if not (len(idx) == 0):
        intersect_line = l.iloc[idx[0]]
        row['intersect_line'] = intersect_line
        print(intersect_line)
    else:
        row['intersect_line'] = 0
        print("no intersection")
    return row

# create an example gdf of intersecting lines
gdf = gpd.GeoDataFrame({'id': [1, 2, 3]}, geometry=[LineString([(1, 1), (4, 4)]),
                                                    LineString([(1, 4), (4, 1)]),
                                                    LineString([(6, 1), (6, 6)])])

# get the lines as a geoseries
lines = gdf['geometry']
# apply the function to the geodataframe
gdf = gdf.apply(check_intersections, axis=1, lines=lines)

# now drop intersecting lines
gdf = gdf[gdf['intersect_line'] == 0].drop(columns='intersect_line')
print(gdf)

Solution

  • How about a spatial join that will utilize a built-in R-tree spatial index?

    import geopandas as gpd
    import pandas as pd
    
    
    gdf = gpd.GeoDataFrame({'id': [1, 2, 3]}, geometry=[LineString([(1, 1), (4, 4)]),
                                                        LineString([(1, 4), (4, 1)]),
                                                        LineString([(6, 1), (6, 6)])])
    # make a copy of the original gdf for spatial join
    gdf_copy = gdf.copy(deep=True)
    # spaital join the copy to the original gdf
    gdf_sj = gpd.sjoin(gdf, gdf_copy)
    
    gdf_sj_vc = gdf_sj['id_left'].value_counts()
    # value counts for intersecting lines will be >=2 
    int_mask = gdf_sj_vc.ge(2)
    # add the mask to gdf_sj as intersects columnn
    gdf_sj['intersects'] = gdf_sj['id_left'].map(int_mask)
    # drop non-intersecting lines
    gdf_no_intersect = gdf_sj.loc[gdf_sj['intersects'] == False]
    
    print(gdf_no_intersect)
    
    id_left geometry                                  index_right   id_right    intersects
    2   3   LINESTRING (6.00000 1.00000, 6.00000 6.00000)          2    3       False