Search code examples
pythonpython-3.xprobability

Generate one sample following two different distributions


I would like to generate randomly a sample in the interval [0;1000] of size 1E4. But following two different distribution : one in the [0;1] interval following an increasing exponential distribution which will generate a bench of values close to 1 with respect to the others close to 0. And then, after to generate the second part of my sample in the [1;1000] interval following an 1/r distribution.

I think the easier way to do this is to split the global sample in two different samples. But I don't know how to deal with it. I tried to use some distribution of the scipy librairy but I didn't find a way to use them properly in order to generate my global sample. What do you think?


Solution

  • You can use two separate distributions but you need to make sure that the number of samples matches at the boundary point (r == 1). So you need to estimate the required number of samples for the left (exp) and right (1/r) distribution. The integrals over the two distributions give the following:

    This means the probability for the left distribution at r = 1 is exp(1) / 1.7183 and for the right distribution it is 1 / 6.9078. So this gives a ratio of ratio = left / right = 10.928. This means you need to generate ratio times more values in the right interval than in the left one in order to end up with the same number of samples at the boundary. Let's say you want to generate N samples in total this means you need to generate N1 = N / (ratio + 1) samples in the left (exp) interval and N2 = N * ratio / (ratio + 1) samples in the right (1/r) interval.

    So here's some sample code:

    import matplotlib.pyplot as plt
    import numpy as np
    
    r_max = 1000.0
    integral_1 = np.exp(1.0) - 1.0  # 1.7183
    integral_2 = np.log(r_max)      # 6.9078
    
    N = 2_000_000  # total number of samples
    ratio = np.exp(1.0) / integral_1 * integral_2  # 10.928
    N1 = int(N / (ratio + 1))
    N2 = int(N * ratio / (ratio + 1))
    
    # Use inverse transform sampling in the following.
    s1 = np.log(integral_1 * np.random.random(size=N1) + 1.0)
    s2 = np.exp(integral_2 * np.random.random(size=N2))
    
    samples = np.concatenate((s1, s2))
    np.random.shuffle(samples)  # optionally shuffle the samples
    
    plt.hist(samples, bins=int(20 * r_max))
    plt.xlim([0, 5])
    plt.show()
    

    Which produces the following distribution:

    Example distribution