Search code examples
pythonnumpyperformancerandomscipy

Fastest way to generate a matrix using beta.rvs in python


I want to generate of matrix of size (500, 1000000) using beta.rvs function of scipy.stats.

I have tried multithreading but still it is taking too much time. Is there any other ways to reduce the execution time? Is there some other library which can do the same thing?


Solution

  • You can use numba to compute the values in parallel with prange:

    import numpy as np
    from scipy.stats import beta
    import numba as nb
    
    import matplotlib.pyplot as plt
    
    a, b = 0.5, 10
    
    N = 50, 1000000
    
    result1 = beta.rvs(a, b, size=N)
    
    @nb.njit(parallel=True, fastmath=True)
    def beta_rvs(a, b, shape):
        out = np.empty(shape, dtype=np.float64).reshape(-1)
        size = np.asarray(shape).prod()
        for i in nb.prange(size):
            out[i] = np.random.beta(a, b)
        return out.reshape(shape)
    
    result2 = beta_rvs(a, b, N)
    
    %timeit -n 1 -r 1 beta.rvs(a, b, size=N)
    %timeit -n 1 -r 1 beta_rvs(a, b, N)
    

    Output:

    3.62 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
    696 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
    

    Output with the requested shape:

    39.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
    6.75 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
    

    Histograms:

    plt.hist(result1.ravel())
    plt.hist(result2.ravel())
    

    enter image description here enter image description here