Is there any way to speed up the computation time of calling a function with multiple time in python?

    import numpy as np
    import matplotlib.pyplot as plt
    from numpy import random
    import time
    from collections import Counter
   
    def simulation(N):
        I = 10
        success = 0
        M = 100
        for i in range(I):
            s = allocate(N,M)
            M -= s
            success += s
        return success

    def allocate(N,M):
        count = Counter(random.randint(N,size = M))
        success = sum(j for v,j in count.items() if j == 1)
        return success

    if __name__ == "__main__":
        start = time.perf_counter() 
        SAMPLE_SIZE = 100000
        N = np.linspace(5,45,41).astype(int)
        Ps = []
        for n in N:
            ps = []
            for _ in range(SAMPLE_SIZE):
                ps.append(simulation(n)/100)
            result = np.average(np.array(ps))
            Ps.append(result)

        elapsed = (time.perf_counter() - start)
        print("Time used:",elapsed)
        plt.scatter(N,Ps)
        plt.show()

Here is my situation. The ultimate goal is to set SAMPLE_SIZE to 10^7. However, when I set it to 10^5, it already requires about 1000sec to run it. Is there any way to make it more efficient and faster? Thanks for giving me suggestions.

Solution

First of all, the implementation of allocate is not very efficient: you can use vectorized Numpy function to do that:

def allocate(N, M):
    success = np.count_nonzero(np.bincount(random.randint(N, size=M)) == 1)
    return success

The thing is most of the time comes from the overhead of Numpy functions performing some checks and create some temporary arrays. You can use Numba to fix this problem:

import numba as nb

@nb.njit('int_(int_, int_)')
def allocate(N,M):
    tmp = np.zeros(N, np.int_)
    for i in range(M):
        rnd = np.random.randint(0, N)
        tmp[rnd] += 1
    count = 0
    for i in range(N):
        count += tmp[i] == 1
    return count

Then, you can speed up the code a bit further by using the Numba decorator @nb.njit('int_(int_)') to the simulation function so to avoid the overhead of calling Numba functions from the CPython interpreter.

Finally, you can speed up the main loop by running it in parallel with Numba (and also avoid the use of slow lists). You can also recycle the tmp array so not to cause too many allocations (that are expensive and do not scale with the number of cores). Here is the resulting final code:

import numpy as np
import matplotlib.pyplot as plt
import time
import numba as nb

# Recycle the `tmp` buffer not to do many allocations
@nb.njit('int_(int_, int_, int_[::1])')
def allocate(N, M, tmp):
    tmp.fill(0)
    for i in range(M):
        rnd = np.random.randint(0, N)
        tmp[rnd] += 1
    count = 0
    for i in range(N):
        count += tmp[i] == 1
    return count

@nb.njit('int_(int_)')
def simulation(N):
    I = 10
    success = 0
    M = 100
    tmp = np.zeros(N, np.int_) # Preallocated buffer
    for i in range(I):
        s = allocate(N, M, tmp)
        M -= s
        success += s
    return success

@nb.njit('float64(int_, int_)', parallel=True)
def compute_ps_avg(n, sample_size):
    ps = np.zeros(sample_size, dtype=np.float64)
    for i in nb.prange(sample_size):
        ps[i] = simulation(n) / 100.0
    # Note that np.average is not yet supported by Numba
    return np.mean(ps)

if __name__ == "__main__":
    start = time.perf_counter() 
    SAMPLE_SIZE = 100_000
    N = np.linspace(5,45,41).astype(int)
    Ps = [compute_ps_avg(n, SAMPLE_SIZE) for n in N]
    elapsed = (time.perf_counter() - start)
    print("Time used:",elapsed)
    plt.scatter(N,Ps)
    plt.show()

Here are performance results on my 10-core machine:

Initial code:          670.6 s
Optimized Numba code:    3.9 s

The resulting code is 172 times faster.

More than 80% of the time is spent in the generation of random numbers. Thus, if you want the code to be faster, one solution is to speed up the generation of random number using a SIMD-optimized random number generator. Unfortunately, AFAIK, this is not possible to (efficiently) achieve this in Python. You certainly need to use a native language like C or C++ to do that.