Search code examples
pythonnumpyscikit-learnnearest-neighborpoint-clouds

Use each neighbor once in sklearn NearestNeighbor


I am comparing two point clouds of different sizes. I don't want to cutoff the last points in the larger pointcloud pc1. For points in pc1 I would like to find the nearest neighbor in pc2. After using this point in pc1 and pc2 it should not be used again for any other comparison. Calculate distances from pc1 to pc2: Start from lowest distance. Take both points in pc1 and pc2 and flag the points as used (or delete). Get next pair with next higher distance ... Until all points in pc2 are used. Return List of distances.

Here is what I tried so far:

from sklearn.neighbors import NearestNeighbors
import numpy as np

pc1 = np.array([[-1, -1], [-2, -1], [1, 1], [2, 2], [3, 5]])
pc2 = np.array([[0, 0], [0, 0], [6,6]])

nbrs = NearestNeighbors(n_neighbors=1, algorithm='kd_tree', metric='euclidean').fit(pc2)
distances, indices = nbrs.kneighbors(pc1)

This is the output:

indices = [[0],[0],[0],[0],[2]]

But I want to have:

indices_wanted = [[0],[1],[2]]

which should refer to the points in pc1: [[-1, -1], [1, 1], [3, 5]]

Is there an efficient way of doing that? My pointclouds are in 3D and I have approximately 8000 points per cloud. It should be very fast, as I need to repeat this procedure for each frame in some "movie" with 200 frames. Create some sample data in 3D:

    pc1 = np.random.randn(8000,3)
    pc2 = np.random.randn(7990,3)

Here is an image to clarify what the situation is: Points in Cube

The red points are pc1 and the green ones are pc2. Ultimately, I have 3D points.


Edit:

I am not restricted to sklearn, I just know it is very efficient. So KDTree is also an option. I may have to include a maximum distance for two nodes for speeding up this approach. The cube I am using is of size 4m (I already delete points that are too far off before running the neighbor search)


Problem: The code provided by @user2874583 takes around 0.8s for one set with 8000 points. That is way too slow. I need less than 0.2s. Is there a way to modify the code to making use of the structure: Cube with no points outside? Maybe sorting the array?


Solution

  • You can use the scipy libary to calculate the distance between to points.

    from scipy.spatial.distance import cdist
    
    def closest_node_index(node, nodes):
        index = cdist([node], nodes).argmin()
        return index
    
    final = []
    for arr in pc2:
        i = closest_node_index(arr, pc1)
        final.append(pc1[i])
        pc1 = np.delete(pc1, i, axis=0)