Search code examples
pythonpython-3.xperformancenumpymasking

How to accelerate numpy array masking?


I am profiling performance of a piece of Python code, using a line profiler.

In the code, I have a numpy array tt of shape (106906,) and dtype=int64. With the help of the profiler, I find that the the second line below mask[tt]=True is quite slow. Is there anyway to accelerate it? I am on Python 3 if that matters.

   mask = np.zeros(100000, dtype='bool')
   mask[tt] = True

Solution

  • You can use Numba as @orlevii has suggested:

    from numba import njit
    @njit
    def f(mask,tt):
        mask[tt] = True
    #Test:
    mask = np.zeros(1000000, dtype='bool')
    tt = np.random.randint(0,1000000,106906)
    f(mask,tt)
    

    A simple %%timeit check suggests that you should expect roughly 3 times faster execution.

    Further speed-up can be achieved by utilizing the GPU. An example of how to do it with PyTorch:

    import torch
    mask = torch.zeros(1000000).type(torch.cuda.FloatTensor)
    tt = torch.randint(0,1000000,torch.Size([106906])).type(torch.cuda.LongTensor)
    mask[tt] =  True
    

    Note that here we use a torch.Tensor object which is the equivalent of numpy.ndarray in PyTorch. Code will run only if you have a GPU (of NVIDIA) with CUDA. Expect x30 speed-up w.r.t your original code on Tesla V100-SXM2.