i have the following function to return a "mixed distribution":
#M=[float]*k = Center Value of distribution,
#S=[float]*k = Standard deviations,
#P=[float]*k = probability for each value (Sum is 1.0)
#rng = Random number generator
#n = len of return array [float]*n
#return [float]*n
def mixed_normal(rng, n, M, S, P):
#See https://en.wikipedia.org/wiki/Mixture_model
idx = np.random.choice(len(M), p=P, replace=True, size=n)
return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)
which is called like:
rng = np.random.default_rng()
def mixed_normal_3(rng, n):
data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
return mixed_normal(rng, n, *data)
with n=10**6
.
But, the implementation is too slow! Currently it takes around 350s on my machine. I need to get it down to approx 30s.
I consider changing
return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)
from a "for-loop" to a "single numpy-call".
But, i can not come up with a working solution!
Minimal working example
import numpy as np
rng = np.random.default_rng()
def mixed_normal(rng, n, M, S, P):
#See https://en.wikipedia.org/wiki/Mixture_model
idx = np.random.choice(len(M), p=P, replace=True, size=n)
# Needs to be optimized
return np.fromiter((rng.normal(M[i], S[i]) for i in idx),dtype=np.float64)
def mixed_normal_3(rng, n):
data = [(-5, 0, 5), (1, 1, 1), (1/3, 1/3, 1/3)]
return mixed_normal(rng, n, *data)
# [float]*(10**6) expected
print( mixed_normal_3( rng , 10**6 ) );
I fixed it by 'pre-slicing' the lists, M
and S
:
def mixed_normal(rng, n, M, S, P):
idx = np.array(np.random.choice(len(M), p=P, replace=True, size=n))
return rng.normal(np.array(M)[idx.astype(int)],np.array(S)[idx.astype(int)],n);
The lists, M
and S
, are expanded to size n
by taking all elements according to the random indices generated in idx
– where idx
has size n
:
M = [0,1,2]
idx = [0,0,1,1,0,0,1,1,2]
M[idx] = [0,0,1,1,0,0,1,1,2]
These 'expanded' lists are then passed to the RNG.
This improved execution time from 350s down to 40s for my test-cases.