Search code examples
pythonnumpymatrixindexingpoint-clouds

NumPy unique() returns indices that are out-of-bounds


I am trying to remove points from a point cloud that are too close to each other. My input is an mx3 matrix where the columns represent xyz coordinates. Code is as follows:

def remove_duplicates(points, threshold):
    # Convert to numpy
    points = np.array(points)

    # Round to within the threshold
    rounded_points = points
    if threshold > 0.0:
        rounded_points = np.round(points/threshold)*threshold

    # Remove duplicate points
    point_tuples = [tuple(point) for point in rounded_points]
    unique_rounded_points, unique_indices = np.unique(point_tuples, return_index = True)

    points = points[unique_indices]

    return points

The issue I am running into is that unique_indices contains values larger than the length of points (2265 and 1000 for my test data). Am I doing something wrong, or is this a bug in NumPy?

Edit: I should note that for very small inputs (tried 27 points), unique() appears to work correctly.


Solution

  • So points is a 2d array, (m,3) in shape, right?

    point_tuples is a list of tuples, i.e. row of rounded_points is now a tuple of 3 floats.

    np.unique is going to turn that into an array to do it's thing

    np.array(point_tuples) is a (m,3) array (again 2d like points). The tuple did nothing.

    unique will act on the raveled form of this array, so unique_indices could have values between 0 and 3*m. Hence your error.

    I see 2 problems - if you want unique to find unique 'rows', you need to make a structured array

    np.array(point_tuples, 'f,f,f')
    

    Also applying unique to floats is tricky. It's next to impossible to find 2 floats that are equal. Rounding reduces this problem but does not eliminate it.

    So it probably is better to use round in such a way that rounded_points is an array of integers. The values don't need to scaled back to match points.

    I can add an example if needed, but first try these suggestions. I'm making a lot of guesses about your data, and I'd like to get some feedback before going further.