Search code examples
python-3.xrandomstatisticsnumpy-ndarraynormal-distribution

How can I generate data which will show inverted bell curve for normal distribution


I have generated random data which follows normal distribution using the below code:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

rng = np.random.default_rng()
number_of_rows = 10000
mu = 0
sigma = 1
data = rng.normal(loc=mu, scale=sigma, size=number_of_rows)

dist_plot_data = sns.distplot(data, hist=False)
plt.show()

The above code generates the below distribution plot as expected:

enter image description here

If I want to create a distribution plot that is exactly an inverse curve like below then how can I generate the random normal distribution data?

enter image description here

I want the data for which the distribution plot will show the inverse curve. How can I generate this normal distribution data?


Solution

  • not sure how useful this is, but it's easy to do with rejection sampling. Borrowing the API from Peter O's previous solution but working with blocks for performance gives me:

    import numpy as np
    
    def invNormal(low, high, mu=0, sd=1, *, size=1, block_size=1024):
        remain = size
        result = []
        
        mul = -0.5 * sd**-2
    
        while remain:
            # draw next block of uniform variates within interval
            x = np.random.uniform(low, high, size=min((remain+5)*2, block_size))
            
            # reject proportional to normal density
            x = x[np.exp(mul*(x-mu)**2) < np.random.rand(*x.shape)]
            
            # make sure we don't add too much
            if remain < len(x):
                x = x[:remain]
    
            result.append(x)
            remain -= len(x)
    
        return np.concatenate(result)
    

    can be used as sns.histplot(invNormal(-4, 4, size=100_000), bins=51), giving me:

    histogram

    note that probability densities have to integrate to 1, so the "wider" you make it the smaller the densities will be (i.e. you can't have a density of 0.4 on the y-axis if the range on the x-axis is [-4, +4]). also, it feels less useful to generate a KDE because it'll struggle with the discontinuity at the edges