Search code examples
pythoncythonshapely

Which type declarations do I have to use in Cython for Shapely Objects


I am new to cython and this might be a beginners question. I have a piece of python code (just a single function) that gets executed several thousand times during my script. Therefore, speeding it up (even if just slightly) would largely improve runtime of the script. I tried to do so with Cython. Reading some tutorials, I learned, that a first step would be to add the correct C type declarations to the function. However, since this function uses Shapely objects, I don't know which types to use.

This is the python code:

import numpy as np
from shapely.geometry import LineString, Point
from shapely.linear import line_locate_point
from shapely.strtree import STRtree

def xy2sd(traj, centerline):
    '''
    Transforms a set of points into the frenet frame given by centerline
    traj: np.ndarray of shape [N, 2] where N is the number of points in the trajectory
    centerline: np.ndarray of shape [M, 2] where M is the number of points in the centerline
    '''
    line = LineString(centerline)
    s_orig = line_locate_point(line, Point([0, 0]))

    s = np.zeros(traj.shape[0])
    d = np.zeros(traj.shape[0])

    nth_entry = 100
    tree = STRtree([Point(x) for id, x in enumerate(line.coords) if id % nth_entry == 0])

    for i in range(traj.shape[0]):
        pnt = Point(traj[i])
        s[i] = line.line_locate_point(pnt)
        d[i] = pnt.distance(line)

        min_idx = tree.nearest(pnt)*nth_entry

        pnt_min_dist = centerline[min_idx]
        pnt_behind = centerline[min_idx-1]

        sign = (pnt_behind[0]-pnt_min_dist[0])*(pnt.y-pnt_min_dist[1]) - \
            (pnt_behind[1]-pnt_min_dist[1])*(pnt.x-pnt_min_dist[0])
        sign = 1 if sign >= 0 else -1 if sign < 0 else 0

        d[i] = sign*d[i]

    sd_coords = np.zeros(traj.shape, dtype=np.float32)
    sd_coords[:, 0] = s - s_orig
    sd_coords[:, 1] = d

    return sd_coords

So far i changed it to the following Cython code

cimport cython
cimport numpy as np

import numpy as np
from shapely.geometry import LineString, Point
from shapely.linear import line_locate_point
from shapely.strtree import STRtree

def xy2sd(traj, centerline):
    '''
    Transforms a set of points into the frenet frame given by centerline
    traj: np.ndarray of shape [N, 2] where N is the number of points in the trajectory
    centerline: np.ndarray of shape [M, 2] where M is the number of points in the centerline
    '''
    line = LineString(centerline)
    s_orig = line_locate_point(line, Point([0, 0]))

    cdef np.ndarray s = np.zeros(traj.shape[0])
    cdef np.ndarray d = np.zeros(traj.shape[0])

    cdef int nth_entry = 100
    tree = STRtree([Point(x) for id, x in enumerate(line.coords) if id % nth_entry == 0])

    cdef int i
    cdef int min_idx
    cdef np.ndarray pnt_min_dist
    cdef np.ndarray pnt_behind
    cdef double cross_prod
    cdef int sign
    for i in range(traj.shape[0]):
        pnt = Point(traj[i])
        s[i] = line.line_locate_point(pnt)
        d[i] = pnt.distance(line)

        min_idx = tree.nearest(pnt)*nth_entry

        pnt_min_dist = centerline[min_idx]
        pnt_behind = centerline[min_idx-1]

        cross_prod = (pnt_behind[0]-pnt_min_dist[0])*(pnt.y-pnt_min_dist[1]) - \
            (pnt_behind[1]-pnt_min_dist[1])*(pnt.x-pnt_min_dist[0])
        sign = 1 if cross_prod >= 0 else -1 if cross_prod < 0 else 0

        d[i] = sign*d[i]

    cdef np.ndarray sd_coords = np.zeros(traj.shape, dtype=np.float32)
    sd_coords[:, 0] = s - s_orig
    sd_coords[:, 1] = d

    return sd_coords

I am able to compile this code, however i did not notice any speedup. I suppose, this is due to the lacking type declarations for the shapely objects of line tree and pnt. Which types do I have to use there? Can I expect major speedups from just adding the types or is there anything else I can do?

Thanks!


Solution

  • Probably the best way to work out what you would want to do is see how it's done in shapely itself, for example in geometry_helpers.pyx

    It would likely look something like this:

    import shapely
    
    # import the types / functions you need from shapely here
    from shapely._geos cimport get_geos_handle, ...
    from shapely._pygeos_api cimport import_shapely_c_api, ...
    
    import_shapely_c_api()
    
    # your actual code ...
    

    However shapely already uses cython internally on the slow bits, so there might not be many easy wins left.

    Alternative Idea

    I think you can make better use of shapely to get a decent speedup without the extra complexity of cython though - some ideas:

    • line.line_locate_point(pnt) and pnt.distance(line) don't seem to have any index so will be pretty slow, would avoid if possible and use a tree.query instead
    • creating lots of geometry seems very slow (e.g. [Point(x) for ...]) ideally batch these up e.g. via shapely.points(array)
    • tree.nearest(...) can take an array of geometries and seems much faster if done in one go the one-at-a-time.

    Putting some of this together I get something like the below which ran about 30x faster than the original for me, though is not quite identical.

    import shapely
    
    def xy2sd_alternate(traj, centerline):
        # RTree of all the line segments in the centerline
        tmp = np.repeat(centerline, repeats=2, axis=0)[1:-1]
        ind = np.arange(tmp.shape[0])//2
        lns = shapely.linestrings(tmp, indices=ind)
        tree = shapely.STRtree(lns)
    
        # distances along the centerline at each of its points
        dists = np.zeros(centerline.shape[0])
        dists[1:] = ((np.diff(cl, axis=0)**2).sum(axis=1)**0.5).cumsum()
    
        # get the indexes of the closest centerline line-segment for each trajectory point
        nearest_line_idx = tree.nearest(shapely.points(traj))
    
        # convert to frenet frame coords
        sd = np.zeros(traj.shape, dtype=np.float32)
        for i in range(traj.shape[0]):
            idx = nearest_line_idx[i]
    
            # (bx,by) = normalized vector for the closest line segment
            segment_length = dists[idx+1] - dists[idx]
            bx = (centerline[idx+1, 0] - centerline[idx, 0]) / segment_length
            by = (centerline[idx+1, 1] - centerline[idx, 1]) / segment_length
    
            # (tx,ty) = vector from start of closest segment to trajectory point
            tx = traj[i,0] - centerline[idx, 0]
            ty = traj[i,1] - centerline[idx, 1]
    
            # unrotate the trajectory point by the angle of the line segment to get to (s,d) coordinates
            sd[i, 0] = (bx*tx + by*ty) + dists[idx]
            sd[i, 1] = (by*tx - bx*ty)
    
        return sd