python pytorch parallel-processing geometry

Finding rectangles enclosing points

I have a list of n 2D points (x, y) and a list of m rectangles (xmin, ymin, xmax, ymax). I want to find indices of all rectangles that enclose a point. This has to be done efficiently. GPU is also available. I'm looking at a solution that is heavily parallelised through vectorised operations. For loops are a no-go for me as 0 < n,m < 10^6

For example:

points = [[1, 1], [2, 2], [5, 5]]
rectangles = [[0, 0, 3, 3], [1, 1, 4, 4], [4, 4, 7, 7]]

result:
(1, 1): [(0, 0, 3, 3), (1, 1, 4, 4)]
(2, 2): [(0, 0, 3, 3), (1, 1, 4, 4)]
(5, 5): [(4, 4, 7, 7)]

I've written this solution in PyTorch (tried chunking to reduce memory overflow)

def find_rectangles_containing_points(points, rectangles, chunk_size=1000):
    n = points.size(0)
    m = rectangles.size(0)

    rectangles_containing_points = []

    for i in range(0, n, chunk_size):
        points_chunk = points[i:i + chunk_size]
        points_expanded = points_chunk.unsqueeze(1)  # chunk_size x 1 x 2
        rectangles_expanded = rectangles.unsqueeze(0)  # 1 x m x 4

        is_inside = (rectangles_expanded[:, :, :2] <= points_expanded) & (points_expanded <= rectangles_expanded[:, :, 2:])

        result = torch.nonzero(is_inside.all(dim=-1))
        rectangles_containing_points.extend([list(rect_ids) for rect_ids in result.t()])

    return rectangles_containing_points

points = torch.rand(1000000, 2)
rectangles = torch.rand(10000, 4)

find_rectangles_containing_points(points, rectangles)

However, this solution doesn't scale to larger values of n and m (~10^6). Are there approaches designed for such specific problems?

Solution

I'd try to get to your problem the other way around by first assessing what points fall into a given rectangle. This operation is quite straightforward and can easily be vectorized:

import numpy as np

# points as numpy array and rectangle coordinates [x_min, y_min, x_max, y_max]

in_rectangle = np.logical_and(
    np.logical_and(points[:, 0] >= x_min, points[:, 0] <= x_max),
    np.logical_and(points[:, 1] >= y_min, points[:, 1] <= y_max)
)

You can then repeat this operation for your m rectangles.

Note that lists are not best for storing this type of result. On the other hand, an (m, n) array would likely not fit in memory if n and m ~ 1e6. That's why you could use a sparse matrix (that only stores non-zero values).

Below, S[i, j] = 1 if point j is in rectangle i.

import numpy as np
from scipy.sparse import dok_matrix

points = np.array([[1, 1], [2, 2], [3, 3], [4, 4], [5, 5]])
rectangles = np.array([[0, 0, 3, 3], [1, 1, 4, 4], [4, 4, 7, 7]])

S = dok_matrix((len(rectangles), len(points)), dtype=np.float32)

for i, coords in enumerate(rectangles):
    
    x_min, y_min, x_max, y_max = coords
    
    in_rectangle = np.logical_and(
        np.logical_and(points[:, 0] >= x_min, points[:, 0] <= x_max),
        np.logical_and(points[:, 1] >= y_min, points[:, 1] <= y_max)
    )
    
    S[i, np.nonzero(in_rectangle)] = 1

print(S.toarray())

Once the matrix is filled, S[:, j] gives whether a rectangle i contains the point j (1 if so, no value if not). You may use the nonzero function to directly get the rectangles indices.

I'm not very familiar with GPU computing but I think you could tweak this code to make it work with PyTorch or JAX instead of Numpy.