I am profiling performance of a piece of Python code, using a line profiler.
In the code, I have a numpy array tt
of shape (106906,) and dtype=int64
. With the help of the profiler, I find that the the second line below mask[tt]=True
is quite slow. Is there anyway to accelerate it? I am on Python 3 if that matters.
mask = np.zeros(100000, dtype='bool')
mask[tt] = True
You can use Numba as @orlevii has suggested:
from numba import njit
@njit
def f(mask,tt):
mask[tt] = True
#Test:
mask = np.zeros(1000000, dtype='bool')
tt = np.random.randint(0,1000000,106906)
f(mask,tt)
A simple %%timeit
check suggests that you should expect roughly 3 times faster execution.
Further speed-up can be achieved by utilizing the GPU. An example of how to do it with PyTorch:
import torch
mask = torch.zeros(1000000).type(torch.cuda.FloatTensor)
tt = torch.randint(0,1000000,torch.Size([106906])).type(torch.cuda.LongTensor)
mask[tt] = True
Note that here we use a torch.Tensor
object which is the equivalent of numpy.ndarray
in PyTorch. Code will run only if you have a GPU (of NVIDIA) with CUDA. Expect x30 speed-up w.r.t your original code on Tesla V100-SXM2.