Search code examples
rcurve-fitting

Fitting "multimodal" lognormal distributions to data using R


I am trying to replicate the problem described in Fitting "multimodal" lognormal distributions to data using python using R.

y = c(196, 486, 968, 2262, 3321, 4203, 15072, 46789, 95201, 
      303494, 421484, 327507, 138931, 27973)
bins = c(0.0150, 0.0306, 0.0548, 0.0944, 0.1540, 0.2560, 
         0.3830, 0.6050, 0.9510, 1.6400, 2.4800, 3.6700, 
         5.3800, 9.9100, 15)

getMeans = function(x){
  count=length(x)
  result=c()
  for(i in 2:count){
    result=c(result,mean(c(x[i-1],x[i])))
  }
  result
}

x_plot = getMeans(bins)

x = x_plot

plot(x,y,log = 'xy',type = 'l')

Is there something similar to lmfit.Models in R as well as shown in the answer.


Solution

  • In R we can get the density of a lognormal distribution with dlnorm. To get a mixture of 2 lognormal distributions over a vector x we would need to be able to control the mean, standard deviation, and amplitude of both distributions independently. This requires the estimation of 6 different parameters.

    If we specify the function as

    logmix <- function(x, A1, A2, M1, M2, S1, S2) {
      
      result <- A1 * (dlnorm(x, M1, S1) + A2 * dlnorm(x, M2, S2))
      
      ifelse(is.na(result), 0, result)
    }
    

    Then we can use non-linear least squares to estimate the parameters:

    model <- nls(log(y) ~ logmix(x, A1, A2, M1, M2, S1, S2), 
                 start = list(A1 = 1e5, A2 = 1e2, M1 = log(0.25), M2 = log(5), 
                              S1 = 0.8, S2 = 0.8),
                 control = list(maxiter = 1000))
    

    Now reviewing our coefficients, we have:

    coefs <- as.list(round(coef(model), 3))
    
    coefs
    #> $A1
    #> [1] 16.658
    #> 
    #> $A2
    #> [1] 62.233
    #> 
    #> $M1
    #> [1] 1.35
    #> 
    #> $M2
    #> [1] 4.499
    #> 
    #> $S1
    #> [1] 1.979
    #> 
    #> $S2
    #> [1] 1.742
    

    We can put these in an expression to show the formula we would use to get our model:

    ex <- substitute(exp(A1*(dlnorm(x, M1, S1) + A2*dlnorm(x, M2, S2))), env = coefs)
    
    ex
    #> exp(16.658 * (dlnorm(x, 1.35, 1.979) + 62.233 * dlnorm(x, 4.499, 1.742)))
    

    And furthermore, we can write this expression as a function that we can use to estimate y for a given x:

    modeller <- as.function(c(alist(x=), ex))
    

    To show this works, let's plot the original data and our model as a line over it:

    plot(x,y, log = "xy")
    plot(modeller, lty = 2, add = TRUE, from = min(x), to = max(x))
    

    Created on 2023-01-11 with reprex v2.0.2