Search code examples
pythonprobability-distribution

Generating truncated negative binomial distribution in python


I am trying to generate datasets following truncated negative binomial distribution consisting of numbers such that the number set has a max value.

def truncated_Nbinom(n, p, max_value, size):
    import scipy.stats as sct
    temp_size = size
    while True:
        temp_size *= 2
        temp = sct.nbinom.rvs(n, p, size=temp_size)
        truncated = temp[temp <= max_value]
        if len(truncated) >= size:
            return truncated[:size]

I am able to get results when the max_value and n are smaller. However when I try with:

input_1= truncated_Nbinom(99, 0.3, 99, 5000).tolist()

The kernel keeps dying. I tried to change the port of python and raising the recursion limit, but they didn't work. Do you have any ideas to make my code faster?


Solution

  • Here is one approach. You can compute the probability of x being selected under the negative binomial, then normalize the probabilities for xs below max_value to sum to one. Now, you can simply call np.random.choice with appropriate probabilities.

    import numpy as np
    import pandas as pd
    from scipy import stats
    
    
    def truncated_Nbinom2(n, p, max_value, size):
      support = np.arange(max_value + 1)
      probs = stats.nbinom.pmf(support, n, p)
      probs /= probs.sum()
      return np.random.choice(support, size=size, p=probs)
    

    Here is an illustration:

    arr1 = truncated_Nbinom(9, 0.3, 9, 50000)
    arr2 = truncated_Nbinom2(9, 0.3, 9, 50000)
    
    df_counts = pd.DataFrame({
        "version_1": pd.Series(arr1).value_counts(),
        "version_2": pd.Series(arr2).value_counts(),
    })
    

    enter image description here