Search code examples
pythonnumpynumpy-ndarrayscikit-image

Is there an efficient way to use the content of one array as a lookup table to determine the content of a second array?


I have an array of points:

centers = array([0,0,0,0],
                [0,1,0,0],  
                [0,0,0,1],    
                [1,0,0,0])

Using scikit.ndimage.distance_transform_edt(), the index of the point closest to each background element is returned in a separate array:

indexes = array([[1,1],[1,1],[1,1],[3,2]],
                [[1,1],[1,1],[1,1],[3,2]],
                [[0,3],[1,1],[3,2],[3,2]],
                [[0,3],[0,3],[0,3],[3,2]])

I then label my original array to have unique values:

labeled_centers = array([0,0,0,0],
                        [0,2,0,0],
                        [0,0,0,3],
                        [4,0,0,0])

My question is, what would be the most efficient way to use my array of indexes to create a new array, where every point is labeled with the label of the closest center point?

labeled_image = array([2,2,2,1,1],
                      [2,2,2,3,1],
                      [4,2,3,3,3],
                      [4,4,4,3,3])

So far I have achieved my desired result by looping through each value in the array, like so:

classed = np.zeros_like(centers)

for x in range(classed.shape[0]):
    for y in range(classed.shape[1]):
        classed[x,y] = labeled_centers[indexes[0,x,y],indexes[1,x,y]]

However, while my example is a small array, my real use case would involve arrays of millions if not billions of datapoints, so I am trying to avoid writing a per-value "for" loop. Is there a more efficient way to achieve the same result?


Solution

  • First changing your code to something that actually works:

    import numpy as np
    
    centers = np.array([[0,0,0,0],
                    [0,1,0,0],  
                    [0,0,0,1],    
                    [1,0,0,0]])
    
    indexes = np.array([[[1,1],[1,1],[1,1],[3,2]],
                    [[1,1],[1,1],[1,1],[3,2]],
                    [[0,3],[1,1],[3,2],[3,2]],
                    [[0,3],[0,3],[0,3],[3,2]]])
    
    labeled_centers = np.array([[0,0,0,0],
                            [0,2,0,0],
                            [0,0,0,3],
                            [4,0,0,0]])
    
    classed = np.zeros_like(centers)
    
    for x in range(classed.shape[0]):
        for y in range(classed.shape[1]):
            classed[x,y] = labeled_centers[indexes[x,y,1],indexes[x,y,0]]
    

    Now do the same as a simple and efficient vectorized 1-liner using advanced indexing:

    classed = labeled_centers[indexes[:,:,1],indexes[:,:,0]]
    print(classed)
    

    result:

    [[2 2 2 3]
     [2 2 2 3]
     [4 2 3 3]
     [4 4 4 3]]