Search code examples
pythonnumpystatisticsnormal-distribution

Generate random normal distribution with kurtosis greater than 3


A normal distribution has a kurtosis of 3. With an increase in outliers in the distribution, the tails become "fat" and the kurtosis increases beyond 3.

How do I generate a random distribution between two numbers with kurtosis greater than 3 (preferably around 5-7)?

Imports

import numpy as np
import scipy.stats import kurtosis

Random Uniform between 0.01-0.10

# Random Uniform Distribution
runif = np.random.uniform(0.01, 0.10, 10000)

kurtosis(runif, fisher=False)

1.8124891901330156

enter image description here

Random Normal between 0.01-0.10

lower = 0.01
upper = 0.10
mu = (upper)/2
sigma = 0.01
N = 10000
retstats = scipy.stats.truncnorm.rvs((lower-mu)/sigma,(upper-mu)/sigma,loc=mu,scale=sigma,size=N)

mean = .05
stdev = .01  # 99.73% chance the sample will fall in your desired range

values = [gauss(mean, stdev) for _ in range(10000)]

kurtosis(values, fisher=False)

3.015004351756201

enter image description here

Random Normal with fat-tails between 0.01-0.10

???


Solution

  • A normal distribution always has a kurtosis of 3. A uniform distribution has a kurtosis of 9/5. Long-tailed distributions have a kurtosis higher than 3. Laplace, for instance, has a kurtosis of 6. [Note that typically these distributions are defined in terms of excess kurtosis, which equals actual kurtosis minus 3.] See the table here: http://mathworld.wolfram.com/KurtosisExcess.html

    By cutting off the tails, however, you only reduce the kurtosis. By cutting tails, it is impossible to generate a normal distribution with kurtosis higher than 3. In order to generate a distribution with limited range and high kurtosis, you will need to ensure that the cut has a minimal effect on the tails and start with a long-tailed (not normal) distribution. Colloquially, you'll need to have a very spiky distribution. I produce one below using Laplace with a small exponential decay parameter.

    import numpy as np                                                                       
    from scipy.stats import kurtosis                                                         
    
    min_range = 0.01                                                                         
    max_range = 0.10                                                                         
    midpoint = (max_range + min_range)/2                                                     
    samples = 10000                                                                          
    
    def filter_tails(x):                                                                     
        return x[(x >= min_range) & (x <= max_range)]                                        
    
    runif = np.random.uniform(min_range, max_range, samples)                                 
    value = kurtosis(filter_tails(runif), fisher=False)                                      
    print(f"uniform kurtosis = {value}")                                                     
    
    sigma = 0.01                                                                             
    runif = np.random.normal(midpoint, sigma, samples)                                       
    value = kurtosis(filter_tails(runif), fisher=False)                                      
    print(f"gaussian kurtosis = {value}")                                                    
    
    exponential_decay = 0.001                                                                
    runif = np.random.laplace(midpoint, exponential_decay, samples)                          
    value = kurtosis(filter_tails(runif), fisher=False)                                      
    print(f"laplace kurtosis = {value}")
    

    Running the script, I get:

    uniform kurtosis = 1.8011863970680828
    gaussian kurtosis = 3.0335178694177785
    laplace kurtosis = 5.76290423111418