I want to generate of matrix of size (500, 1000000) using beta.rvs function of scipy.stats.
I have tried multithreading but still it is taking too much time. Is there any other ways to reduce the execution time? Is there some other library which can do the same thing?
You can use numba to compute the values in parallel with prange:
import numpy as np
from scipy.stats import beta
import numba as nb
import matplotlib.pyplot as plt
a, b = 0.5, 10
N = 50, 1000000
result1 = beta.rvs(a, b, size=N)
@nb.njit(parallel=True, fastmath=True)
def beta_rvs(a, b, shape):
out = np.empty(shape, dtype=np.float64).reshape(-1)
size = np.asarray(shape).prod()
for i in nb.prange(size):
out[i] = np.random.beta(a, b)
return out.reshape(shape)
result2 = beta_rvs(a, b, N)
%timeit -n 1 -r 1 beta.rvs(a, b, size=N)
%timeit -n 1 -r 1 beta_rvs(a, b, N)
Output:
3.62 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
696 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Output with the requested shape:
39.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
6.75 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Histograms:
plt.hist(result1.ravel())
plt.hist(result2.ravel())