python python-3.x performance numpy masking

How to accelerate numpy array masking?

I am profiling performance of a piece of Python code, using a line profiler.

In the code, I have a numpy array tt of shape (106906,) and dtype=int64. With the help of the profiler, I find that the the second line below mask[tt]=True is quite slow. Is there anyway to accelerate it? I am on Python 3 if that matters.

   mask = np.zeros(100000, dtype='bool')
   mask[tt] = True

Solution

You can use Numba as @orlevii has suggested:

from numba import njit
@njit
def f(mask,tt):
    mask[tt] = True
#Test:
mask = np.zeros(1000000, dtype='bool')
tt = np.random.randint(0,1000000,106906)
f(mask,tt)

A simple %%timeit check suggests that you should expect roughly 3 times faster execution.

Further speed-up can be achieved by utilizing the GPU. An example of how to do it with PyTorch:

import torch
mask = torch.zeros(1000000).type(torch.cuda.FloatTensor)
tt = torch.randint(0,1000000,torch.Size([106906])).type(torch.cuda.LongTensor)
mask[tt] =  True

Note that here we use a torch.Tensor object which is the equivalent of numpy.ndarray in PyTorch. Code will run only if you have a GPU (of NVIDIA) with CUDA. Expect x30 speed-up w.r.t your original code on Tesla V100-SXM2.