Search code examples
pythonarraysnumpyscipyndimage

Faster index computation from Scipy labelled array apart from np.where


I am working on a large array (3000 x 3000) over which I use scipy.ndimage.label. The return is 3403 labels and the labelled array. I would like to know the indices of these labels for e.g. for label 1 I should know the rows and columns in the labelled array. So basically like this

a[0] = array([[1, 1, 0, 0],
              [1, 1, 0, 2],
              [0, 0, 0, 2],
              [3, 3, 0, 0]])


indices = [np.where(a[0]==t+1) for t in range(a[1])] #where a[1] = 3  is number of labels. 

print indices
[(array([0, 0, 1, 1]), array([0, 1, 0, 1])), (array([1, 2]), array([3, 3])), (array([3, 3]), array([0, 1]))]

And I would like to create a list of indices for all 3403 labels like above. The above method seems to be slow. I tried using generators, it doesn't look like there is improvement.

Are there any efficient ways?


Solution

  • Well the idea with gaining efficiency would be to minimize the work once inside the loop. A vectorized method isn't possible given that you would have variable number of elements per label. So, with those factors in mind, here's one solution -

    a_flattened = a[0].ravel()
    sidx = np.argsort(a_flattened)
    afs = a_flattened[sidx]
    cut_idx = np.r_[0,np.flatnonzero(afs[1:] != afs[:-1])+1,a_flattened.size]
    row, col = np.unravel_index(sidx, a[0].shape)
    row_indices = [row[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
    col_indices = [col[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
    

    Sample input, output -

    In [59]: a[0]
    Out[59]: 
    array([[1, 1, 0, 0],
           [1, 1, 0, 2],
           [0, 0, 0, 2],
           [3, 3, 0, 0]])
    
    In [60]: a[1]
    Out[60]: 3
    
    In [62]: row_indices # row indices
    Out[62]: 
    [array([0, 0, 1, 2, 2, 2, 3, 3]), # for label-0
     array([0, 0, 1, 1]),             # for label-1
     array([1, 2]),                   # for label-2    
     array([3, 3])]                   # for label-3
    
    In [63]: col_indices  # column indices
    Out[63]: 
    [array([2, 3, 2, 0, 1, 2, 2, 3]), # for label-0
     array([0, 1, 0, 1]),             # for label-1
     array([3, 3]),                   # for label-2
     array([0, 1])]                   # for label-3
    

    The first elements off row_indices and col_indices are the expected output. The first groups from each those represent the 0-th regions, so you might want to skip those.