I have a list of lines defined by start and end points. The size is on the order of 100,000s to possibly low 1,000,000. For making a list of points I use points_from_xy in GeoPandas, which is highly optimized, but is there a similar and fast way to make LineStrings in GeoPandas/Shapely?
My current method is as follows, but I can't think of another way that can bypass the use of an explicit loop.
[((start_x[i], start_y[i]), (end_x[i], end_y[i])) for i in range(n_pts)]
You can use points_from_xy
to build two sets of GeometryArrays, then use some sneaky geometric set operations and constructive methods to get the result. Specifically, the convex_hull
of two points is a line :)
# setup
import numpy as np, geopandas as gpd, shapely.geometry
N = int(1e7)
x1, x2, y1, y2 = (np.random.random(size=N) for _ in range(4))
Running the following with 10 million points finishes in a manageable amount of time:
In [3]: %%time
...:
...: points1 = gpd.points_from_xy(x1, y1)
...: points2 = gpd.points_from_xy(x2, y2)
...: lines = points1.union(points2).convex_hull
...:
...:
CPU times: user 18 s, sys: 4.93 s, total: 22.9 s
Wall time: 25 s
The result is a GeometryArray
of LineString
objects:
In [4]: lines
Out[4]:
<GeometryArray>
[<shapely.geometry.linestring.LineString object at 0x186e78880>,
<shapely.geometry.linestring.LineString object at 0x186e78d60>,
<shapely.geometry.linestring.LineString object at 0x186e78880>,
<shapely.geometry.linestring.LineString object at 0x186e78d60>,
<shapely.geometry.linestring.LineString object at 0x186e78880>,
<shapely.geometry.linestring.LineString object at 0x186e78d60>,
<shapely.geometry.linestring.LineString object at 0x186e78880>,
<shapely.geometry.linestring.LineString object at 0x186e78d60>,
<shapely.geometry.linestring.LineString object at 0x186e78880>,
<shapely.geometry.linestring.LineString object at 0x186e78d60>,
...
<shapely.geometry.linestring.LineString object at 0x186e79e70>,
<shapely.geometry.linestring.LineString object at 0x186e7bac0>,
<shapely.geometry.linestring.LineString object at 0x186e79e70>,
<shapely.geometry.linestring.LineString object at 0x186e7bac0>,
<shapely.geometry.linestring.LineString object at 0x186e79e70>,
<shapely.geometry.linestring.LineString object at 0x186e7bac0>,
<shapely.geometry.linestring.LineString object at 0x186e79e70>,
<shapely.geometry.linestring.LineString object at 0x186e7bac0>,
<shapely.geometry.linestring.LineString object at 0x186e79e70>,
<shapely.geometry.linestring.LineString object at 0x186e7bac0>]
Length: 10000000, dtype: geometry
I tried this using shapely.geometry.LineString
with 1/10 the points (1e6) in a list comprehension and it took 23.8 seconds. I got bored waiting for this with 1e7 points...