Search code examples
pythonnumpyperformance

Bottleneck using np.where (in term of computing ressources i.e. memory and speed)


In the current example, I spent (many) hours in order to find a way to sort the M matrix so that it corresponds exactly to the target one.

  • The 2 first columns are used to reorganize M
  • The matrixOrder indicates how to "proceed" (couples of values are important)
  • note the 2 couples [5, 15] and [5, 11] which must be tackled differently
  • at the final stage, the third column is extracted

For big arrays (hundreds thousands of lines, even millions), the amount of memory is huge and this step time is consuming (bottleneck).

# -*- coding: utf-8 -*-
import numpy as np

M = np.array([ [ 5  , 15 , 6 ],
               [ 14 , 15 , 14 ],
               [ 5  , 11 , 350 ],
               [ 5  , 11, 352 ],
               [ 5  , 11 , 351 ],
               [ 5  , 11 , 350 ],
               [ 9  , 11 , 351 ],
               [ 9  , 11 , 95 ],
               [ 9  , 11 , 353 ],
               [ 9  , 11 , 354 ],
               [ 28 , 15 , 28 ],
               [ 2  , 8 , 46 ],
               [ 2  , 8 , 353 ],
               [ 2  , 8 , 45 ],
               [ 21 , 15 , 21 ],
               [ 31 , 20 , 355 ],
               [ 31 , 20 , 358 ]])


matrixOrder = np.array([ [14, 15],
                         [2 , 8],
                         [31, 20],
                         [5 , 11],
                         [21, 15],
                         [9, 11],
                         [5, 15],
                         [28, 15] ])
                                           
Target = np.array([ [ 14 , 15 , 14 ],
                    [ 2 , 8 , 46 ],
                    [ 2 , 8 , 353 ],
                    [ 2 , 8 , 45 ],
                    [ 31 , 20 , 355 ],
                    [ 31 , 20 , 358 ],
                    [ 5 , 11 , 350 ],
                    [ 5 , 11 , 352 ],
                    [ 5 , 11 , 351 ],
                    [ 5 , 11 , 350 ],
                    [ 21 , 15 , 21 ],
                    [ 9 , 11 , 351 ],
                    [ 9 , 11 , 95 ],
                    [ 9 , 11 , 353 ],
                    [ 9 , 11 , 354 ],
                    [ 5 , 15 , 6 ],
                    [ 28 , 15 , 28 ] ])
                                        
index = np.where( (matrixOrder[:, 0].reshape(-1, 1) == M[:, 0].tolist()) &
                  (matrixOrder[:, 1].reshape(-1, 1) == M[:, 1].tolist()) )

T = M[index[1], :]
check = np.array_equal(Target, T)

Solution

  • Try doing a dictionary lookup instead of using np.where

    something like -

    def sort_mat(M, ord):
        # Convert order pairs to tuples
        k = [tuple(r) for r in ord]
        
        # Make lookup dictionary
        d = {k: i for i,k in enumerate(k)}
        
        # Get sorted indices
        idx = sorted(range(len(M)), key=lambda x: d.get((M[x,0], 
        M[x,1]), len(k)))
        
        #return the sorted matrix
        return M[idx]