Search code examples
rgamlss

How to generate random numbers from the best fitted distribution with gamlss?


Objective

Find the best fitted distribution for data and then generate random numbers from this distribution.

Example

Using the gamlss package in R, I found that the best fit distribution is "Skew exponential power (Azzalini type 1)":

library(gamlss)
library(gamlss.dist)
library(gamlss.add)


m1 <- fitDist(mtcars$wt, k = 2, type = "realAll", trace = FALSE, try.gamlss = TRUE)

  
summary(m1)
# *******************************************************************
#   Family:  c("SEP1", "Skew exponential power (Azzalini type 1)") 
# 
# Call:  gamlssML(formula = y, family = DIST[i]) 
# 
# Fitting method: "nlminb" 
# 
# 
# Coefficient(s):
#   Estimate   Std. Error     t value Pr(>|t|)
# eta.mu     3.440000000  0.000149516 23007.56651  < 2e-16
# eta.sigma -1.856665040  0.861826733    -2.15434 0.031214
# eta.nu     0.150728244           NA          NA       NA
# eta.tau   -3.524272086           NA          NA       NA
# 
# eta.mu    ***
#   eta.sigma *  
#   eta.nu       
# eta.tau      
# ---
#   Signif. codes:  
#   0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Degrees of Freedom for the fit: 4 Residual Deg. of Freedom   28 
# Global Deviance:     -27.4747 
# AIC:     -19.4747 
# SBC:     -13.6117 
# Warning message:
#   In sqrt(diag(object$vcov)) : NaNs produced

Error

But the values of sigma and tau are negative. When I provide these values to rSEP1() to generate a random number, it throws the following error:

rSEP1(1, mu = 3.44, sigma = -1.8, nu = 0.15, tau = -3.5)
# Error in rSEP1(1, mu = 3.44, sigma = -1.8, nu = 0.15, tau = -3.5) : 
#   sigma must be positive

Are these values transformed? How can I provide correct inputs to rSEP1()?


Solution

  • If you look at the link functions for the parameters, you will see:

    SEP1()
    
    #> GAMLSS Family: SEP1 Skew exponential power (Azzalini type 1) 
    #> Link function for mu   : identity 
    #> Link function for sigma: log 
    #> Link function for nu   : identity 
    #> Link function for tau  : log 
    

    So the values for sigma and tau that fitDist returns are the log of the numbers you would plug into rSEP1. In other words, the correct way to run rSEP1 from your model is:

    rSEP1(1, mu = 3.44, sigma = exp(-1.8), nu = 0.15, tau = exp(-3.5))
    #> [1] 3.460991
    

    To show that this is the case, let us create a set of random numbers from a defined rSEP1 where sigma = 0.5, and tau = 0.5. If we then find the best-fitting distribution, we should get results close to log(0.5) for both sigma and tau. Since log(0.5) is about -0.69, we would expect values of about -0.69 for both these parameters:

    set.seed(1)
    testvec <- rSEP1(10000, mu = 3.5, sigma = 0.5, nu = 2.5, tau = 0.5)
    m2 <- fitDist(testvec, k = 2, type = "realAll", trace = FALSE, try.gamlss = TRUE)
    
    m2
    #> Family:  c("SEP1", "Skew exponential power (Azzalini type 1)") 
    #> Fitting method: "nlminb" 
    #> 
    #> Call:  gamlssML(formula = y, family = DIST[i]) 
    #> 
    #> Mu Coefficients:
    #> [1]  3.5
    #> Sigma Coefficients:
    #> [1]  -0.6828
    #> Nu Coefficients:
    #> [1]  2.421
    #> Tau Coefficients:
    #> [1]  -0.6952
    

    And plugging the exponents of sigma and tau into dSEP1 gives us a near-perfect fit to the density of our test vector:

    d <- density(testvec)
    plot(d$x, d$y)
    lines(d$x, dSEP1(d$x, mu = 3.5, sigma = exp(-0.68), nu = 2.42, tau = exp(-0.69)),
         col = "red", lwd = 2, lty = 2)
    

    enter image description here

    It's worth pointing out that the actual fit obtained for mtcars$wt in the example is pretty terrible. The small value of sigma jyst means that most random values drawn from the distribution will be close to the mean of mtcars$wt. There are only 32 points in the data set, which makes it very difficult to automatically fit a parametric distribution accurately.