Faster index computation from Scipy labelled array apart from np.where

I am working on a large array (3000 x 3000) over which I use scipy.ndimage.label. The return is 3403 labels and the labelled array. I would like to know the indices of these labels for e.g. for label 1 I should know the rows and columns in the labelled array. So basically like this

a[0] = array([[1, 1, 0, 0],
              [1, 1, 0, 2],
              [0, 0, 0, 2],
              [3, 3, 0, 0]])


indices = [np.where(a[0]==t+1) for t in range(a[1])] #where a[1] = 3  is number of labels. 

print indices
[(array([0, 0, 1, 1]), array([0, 1, 0, 1])), (array([1, 2]), array([3, 3])), (array([3, 3]), array([0, 1]))]

And I would like to create a list of indices for all 3403 labels like above. The above method seems to be slow. I tried using generators, it doesn't look like there is improvement.

Are there any efficient ways?

Solution

Well the idea with gaining efficiency would be to minimize the work once inside the loop. A vectorized method isn't possible given that you would have variable number of elements per label. So, with those factors in mind, here's one solution -

a_flattened = a[0].ravel()
sidx = np.argsort(a_flattened)
afs = a_flattened[sidx]
cut_idx = np.r_[0,np.flatnonzero(afs[1:] != afs[:-1])+1,a_flattened.size]
row, col = np.unravel_index(sidx, a[0].shape)
row_indices = [row[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
col_indices = [col[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]

Sample input, output -

In [59]: a[0]
Out[59]: 
array([[1, 1, 0, 0],
       [1, 1, 0, 2],
       [0, 0, 0, 2],
       [3, 3, 0, 0]])

In [60]: a[1]
Out[60]: 3

In [62]: row_indices # row indices
Out[62]: 
[array([0, 0, 1, 2, 2, 2, 3, 3]), # for label-0
 array([0, 0, 1, 1]),             # for label-1
 array([1, 2]),                   # for label-2    
 array([3, 3])]                   # for label-3

In [63]: col_indices  # column indices
Out[63]: 
[array([2, 3, 2, 0, 1, 2, 2, 3]), # for label-0
 array([0, 1, 0, 1]),             # for label-1
 array([3, 3]),                   # for label-2
 array([0, 1])]                   # for label-3

The first elements off row_indices and col_indices are the expected output. The first groups from each those represent the 0-th regions, so you might want to skip those.