Search code examples
pythonnumpyscipy

Python - speeding up a for loop


So I am currently putting together a script which does the following steps:

  1. Simulate from a poisson distribution. Say this sample results in 5 from the poisson distribution
  2. From the value of the poisson distribution, simulate 5 values from a lognormal distribution
  3. Repeat the above process 100k+ times
  4. Apply some logic to these values to get some staistics out

The fastest way I think to do this is via a large matrix which has, down the rows each simulation and across the columns every value simulated from the lognormal distribution. With this matrix in place I can then apply the metrics I need.

Here's what I've done so far, which seems to work - however i'm a beginner at python and I think there should be a way to do this "vectorised" and without having to resort to a for-loop.

Any improvements in speed would be greatly appreciated as I ideally want to run this for 1m simulations:



import pandas as pd
import numpy as np
import os
import scipy.stats as sp


n_sims = 100000
MU = 3
poisson = sp.poisson(mu=MU)


#first want to sample 10000 numbers from a poisson ## this is step 1
no_clms_arr = poisson.rvs(size=n_sims)

#now for each element sampled sample a lognormal and store the results back to a matrix
result = np.zeros((n_sims,no_clms_arr.max()))

for idx,i in enumerate(no_clms_arr):
    if i == 0:
        continue # no need to do anything here
    sev_sample = sp.lognorm(50,25).rvs(i) ## this is step 2
    sev_sample = np.pad(sev_sample,(0,no_clms_arr.max()-i),'constant',constant_values=(0,0))  ## padding out so I can append back to the result matrix

    result[idx,:] = sev_sample


#now calculate some metrics ## step 4, example metric is below
lim = 5e6
xs = 2e6
np.clip(a = result-xs,a_min=0,a_max=lim).sum(axis=1).mean()

Solution

  • You can do it completely vectorized as the result array is unnecessary so you just need to generate your random numbers and then do the clip and sum:

    vals = sp.lognorm(50, 25).rvs(no_clms_arr.sum())
    (vals - xs).clip(0, lim).sum() / n_sims
    

    You can speed up your method with a for loop (assuming it is necessary to make the array result) by generating all of your random numbers outside of the for loop in a single call, and then assign these values to the corresponding point in result with a for loop, ie replace your for loop with:

    vals = sp.lognorm(50, 25).rvs(no_clms_arr.sum())
    
    start = 0
    
    for i, j in enumerate(no_clms_arr):
        if j:
            result[i, :j] = vals[start: start + j]
            start += j
    

    You can also replace the .sum(axis=1).mean() call with .sum()/n_sims for an additional speed up.

    Timings:

    def op(n_sims = 10000, mu = 3, lim=5e6, xs=2e6):
        poisson = sp.poisson(mu=mu)
        no_clms_arr = poisson.rvs(size=n_sims)
        result = np.zeros((n_sims,no_clms_arr.max()))
        for idx,i in enumerate(no_clms_arr):
            if i == 0:
                continue # no need to do anything here
            sev_sample = sp.lognorm(50,25).rvs(i) ## this is step 2
            sev_sample = np.pad(sev_sample,(0,no_clms_arr.max()-i),'constant',constant_values=(0,0))  ## padding out so I can append back to the result matrix
            result[idx,:] = sev_sample
        return np.clip(a = result-xs,a_min=0,a_max=lim).sum(axis=1).mean()
    
    def nin17_vectorized(n_sims = 10000, mu=3, lim=5e6, xs=2e6):
        poisson = sp.poisson(mu=mu)
        no_clms_arr = poisson.rvs(size=n_sims)
        vals = sp.lognorm(50, 25).rvs(no_clms_arr.sum())
        return (vals - xs).clip(0, lim).sum() / n_sims
    
    
    def nin17_for(n_sims = 10000, mu=3, lim=5e6, xs=2e6):
        poisson = sp.poisson(mu=mu)
        no_clms_arr = poisson.rvs(size=n_sims)
        result = np.zeros((n_sims,no_clms_arr.max()))
        vals = sp.lognorm(50, 25).rvs(no_clms_arr.sum())
        start = 0
        for i, j in enumerate(no_clms_arr):
            if j:
                result[i, :j] = vals[start: start + j]
                start += j
        return np.clip(a = result-xs,a_min=0,a_max=lim).sum() / n_sims
    
    state = np.random.get_state()
    
    np.random.set_state(state)
    print(op())
    np.random.set_state(state)
    print(nin17_for())
    np.random.set_state(state)
    print(nin17_vectorized())
    
    %timeit op()
    %timeit nin17_for()
    %timeit nin17_vectorized()
    

    Output:

    5682372.071238058
    5682372.071238058
    5682372.071238059
    2.54 s ± 8.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    5.62 ms ± 87.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    1.6 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)