Search code examples
rrandomdistributionnormal-distribution

Drawing from truncated normal distribution delivers wrong standard deviation in R


I draw random numbers from a truncated normal distribution. The truncated normal distribution is supposed to have mean 100 and standard deviation 60 after truncation at 0 from the left. I computed an algorithm to compute the mean and sd of the normal distribution prior to the truncation (mean_old and sd_old). The function vtruncnorm gives me the (wanted) variance of 60^2. However, when I draw random variables from the distribution, the standard deviation is around 96. I don't understand why the sd of the random variables varies from the computation of 60.

I tried increasing the amount of draws - still results in sd around 96.

 require(truncnorm)
 mean_old = -5425.078
 sd_old = 745.7254
 val = rtruncnorm(10000, a=0,  mean = mean_old, sd = sd_old)
 sd(val)
 sqrt(vtruncnorm( a=0,  mean = mean_old, sd = sd_old))

Solution

  • Ok, I did quick test

    require(truncnorm)
    
    val = rtruncnorm(1000000, a=7.2,  mean = 0.0, sd = 1.0)
    sd(val)
    sqrt(vtruncnorm( a=7.2,  mean = 0.0, sd = 1.0))
    

    Canonical truncated gaussian. At a=6 they are very close, 0.1554233 vs 0.1548865 f.e., depending on seed etc. At a = 7 they are systematically different, 0.1358143 vs 0.1428084 (sampled value is smaller that function call). I've checked with Python implementation

    import numpy as np
    from scipy.stats import truncnorm
    
    a, b = 7.0, 100.0
    
    mean, var, skew, kurt = truncnorm.stats(a, b, moments='mvsk')
    
    print(np.sqrt(var))
    
    r = truncnorm.rvs(a, b, size=100000)
    print(np.sqrt(np.var(r)))
    

    and got back 0.1428083662823426 which is consistent with R vtruncnorm result. At your a=7.2 or so results are even worse.

    Moral of the story - at high a values sampling from rtruncnorm has a bug. Python has the same problem as well.