Search code examples
rggplot2non-linear-regression

nonlinear regression with exp(-exp(-x/c)) as model (R,nls)


I have a hard time fitting the following model to subsequent data:

y = a * exp( - b * exp( - x/c))

DF$x=c(200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000,2100,2200,2300,2400,2500,2600,2700,2800,2900,3000,3100,3200,3300,3400,3500,3600,3700,3800,3900,4000,4100,4200,4300,4400,4500,4600,4700,4800,4900,5000,5100,5200,5300,5400,5500,5600,5700,5800,5900,6000,6100,6200,6300,6400,6500,6600,6700,6800,6900,7000)
DF$y=c(50,150,350,550,1050,1650,2950,4750,7850,12350,18950,27250,36750,49750,63250,79050,95450,112850,134550,158050,184650,211150,237750,270150,299650,334850,373450,413050,453050,490350,534250,574050,622750,666550,707350,760250,803050,848650,893850,928250,973850,1006250,1047650,1075850,1113850,1146350,1180150,1212650,1243950,1275850,1306250,1332150,1372350,1402550,1440650,1471550,1503550,1549850,1583150,1628850,1664250,1711250,1746850,1793250,1837950,1884750,1930850,1976750,2008650)

Actually, I need a log(y, base = 10) plot. So when I use ggplot and stat_smooth(method='nls',...), I end up with a horrible fit. And I do not know how to improve the fit. Can you think of a reasonable approach? Thanks !

My code so far :

ggplot(DF, aes(x = x, y = y)) + geom_point() + 
stat_smooth(method = 'nls', 
    formula = y ~ a * exp(- b * exp( - x / 50 )), 
    aes(colour = 'Exponential'), 
    se = FALSE,   start = list(a=1000,b=10)) +
    scale_y_continuous(trans='log10') 

Solution

  • tl;dr You are trying to fit a Gompertz curve, and R has an SSgompertz function that does the trick.

    Data:

    x <- seq(200,7000,by=100)
    y <- c(50,150,350,550,1050,1650,2950,4750,7850,12350,18950,27250,36750,49750,63250,79050,95450,112850,134550,158050,184650,211150,237750,270150,299650,334850,373450,413050,453050,490350,534250,574050,622750,666550,707350,760250,803050,848650,893850,928250,973850,1006250,1047650,1075850,1113850,1146350,1180150,1212650,1243950,1275850,1306250,1332150,1372350,1402550,1440650,1471550,1503550,1549850,1583150,1628850,1664250,1711250,1746850,1793250,1837950,1884750,1930850,1976750,2008650)
    DF <- data.frame(x,y)
    

    The parameterization of R's self-starting Gompertz function (?SSgompertz) is a*exp(-b2*b3^x) = a*exp(-b2*exp(x*log(b3)) = a*exp(-b2*exp(x/(1/log(b3)))

    library("ggplot2"); theme_set(theme_bw())
    n1 <- nls(y ~ SSgompertz(x, p1,p2,p3 ), data=DF)
    coef(n1)
    ##           p1           p2          p3
    ## 2.530746e+06 6.907014e+00 9.995405e-01
    

    Translating this into your terms:

    coef2 <- with(as.list(coef(n1)),
                c(a=p1,b=p2,c=-1/log(p3)))
    
    ##             a            b             c 
    ##  2.530746e+06 6.907014e+00  2.175813e+03 
    

    Check:

    tmpf0 <- function(x) {
        with(as.list(coef(n1)),p1*exp(-p2*p3^x))
    }
    
    tmpf <- function(x) {
        y <- with(as.list(coef2),a*exp(-b*exp(-x/c)))
        print(range(y))
        return(y)
    }
    tmpf0(200)  ## 4645.226
    tmpf(200)   ## ditto
    

    I can easily draw the plot with the predicted values, but for reasons I can't currently figure out I can't (1) embed the fit within ggplot2 (2) use stat_function() with tmpf to add the results to the plot.

    DF$pred <- predict(n1)
    g0 <- ggplot(DF,aes(x,y))+geom_point()
    g0+ geom_line(aes(y=pred),colour="red") +
            scale_y_log10()
    

    enter image description here