I am trying to generate datasets following truncated negative binomial distribution consisting of numbers such that the number set has a max value.
def truncated_Nbinom(n, p, max_value, size):
import scipy.stats as sct
temp_size = size
while True:
temp_size *= 2
temp = sct.nbinom.rvs(n, p, size=temp_size)
truncated = temp[temp <= max_value]
if len(truncated) >= size:
return truncated[:size]
I am able to get results when the max_value and n are smaller. However when I try with:
input_1= truncated_Nbinom(99, 0.3, 99, 5000).tolist()
The kernel keeps dying. I tried to change the port of python and raising the recursion limit, but they didn't work. Do you have any ideas to make my code faster?
Here is one approach. You can compute the probability of x
being selected under the negative binomial, then normalize the probabilities for x
s below max_value
to sum to one. Now, you can simply call np.random.choice
with appropriate probabilities.
import numpy as np
import pandas as pd
from scipy import stats
def truncated_Nbinom2(n, p, max_value, size):
support = np.arange(max_value + 1)
probs = stats.nbinom.pmf(support, n, p)
probs /= probs.sum()
return np.random.choice(support, size=size, p=probs)
Here is an illustration:
arr1 = truncated_Nbinom(9, 0.3, 9, 50000)
arr2 = truncated_Nbinom2(9, 0.3, 9, 50000)
df_counts = pd.DataFrame({
"version_1": pd.Series(arr1).value_counts(),
"version_2": pd.Series(arr2).value_counts(),
})