Search code examples
pythonnumpyrandomstatisticsstandard-deviation

Generating random numbers with upper and lower standard deviation of the scatter


I have to produce randomly generated, normally distributed numbers, based on an astronomical table containing a most probable value and a standard deviation. The peculiar thing is that the standard deviation is not given by one, but two numbers - an upper standard deviation of the error and a lower, something like this:

mass_object, error_up, error_down
7.33, 0.12, 0.07
9.40, 0.04, 0.02
6.01, 0.11, 0.09
...

For example, this means for the first object that if a random mass m gets generated with m<7.33, then probably it will be further away from 7.33 than in the case that m>7.33. So I am looking now for a way to randomly generate the numbers and this way has to include the 2 possible standard deviations. If I was dealing with just one standard deviation per object, I would create the random number (mass) of the first object like that:

mass_random = np.random.normal(loc=7.33, scale=0.12)

Do you have ideas how to create these random numbers with upper and lower standard deviation of the scatter? Tnx


Solution

  • As we discussed in the comments, a normal distribution has the same standard deviation in each direction (it's symmetric around the mean). So we know our distribution won't be normal. We can try a lognormal approach, since this allows us to introduce the idea of skewness. To do this in Python, you'll need Scipy. Here's a crude approach, assuming that 68% of data is on the mean, 16% is at the high point, and 16% is at the low point. We fit the distribution to that crude dataset, then we can calculate new points from the distribution:

    import scipy.stats
    
    #  Choose one of the rows
    mean, high, low = 7.33, 0.12, 0.07
    
    #  Create a dummy dataset to fit the distribution
    values = [mean] * 68 + [mean + high] * 16 + [mean - low ] * 16
    
    # Print the fit distribution
    fit_dist = scipy.stats.lognorm.fit(values)
    print(fit_dist)
    
    #  Calculate 10 new random values based on the fit
    scipy.stats.lognorm.rvs(*fit_dist, size=10)
    
    array([7.25541865, 7.34873107, 7.33831589, 7.36387121, 7.26912469,
           7.33084677, 7.35626689, 7.33907124, 7.32522422, 7.31688687])