Search code examples
rggplot2nls

How to add a logarithmic nonlinear fit to ggplot?


I'd like to fit the logarithmic curve through my data using nls.

library(dplyr)
library(ggplot2)
a <- 3
b <- 2
Y <- data_frame(x = c(0.2, 0.5, 1, 2, 5, 10),
                y = a + b*log(x))

Y %>%
  ggplot(aes(x = x, y = y)) +
  geom_point(shape = 19, size = 2) +
  geom_smooth(method = "nls",
              formula = y ~ p1 + p2*log(x),
              start = list(a = a, b = b),
              se = FALSE,
              control = list(maxiter = 100))

This gives me an error:

Error in method(formula, data = data, weights = weight, ...) : number of iterations exceeded maximum of 100

What is going wrong?


Solution

  • Here's some text I copied and pasted after doing ?nls:

    Warning

    Do not use nls on artificial "zero-residual" data.

    The nls function uses a relative-offset convergence criterion that compares the numerical imprecision at the current parameter estimates to the residual sum-of-squares. This performs well on data of the form

    y = f(x, θ) + eps

    (with var(eps) > 0). It fails to indicate convergence on data of the form

    y = f(x, θ)

    because the criterion amounts to comparing two components of the round-off error. If you wish to test nls on artificial data please add a noise component, as shown in the example below.

    That inspired me to try this:

    > library(dplyr)
    > library(ggplot2)
    > a <- 3
    > b <- 2
    > Y <- data_frame(x = c(0.2, 0.5, 1, 2, 5, 10),
    +                 y = a + b*log(x)*(1 + rnorm(length(x), sd=0.001)))
    > Y %>%
    +   ggplot(aes(x = x, y = y)) +
    +   geom_point(shape = 19, size = 2) +
    +   geom_smooth(method = "nls",
    +               formula = y ~ p1 + p2*log(x),
    +               start = list(p1 = a, p2 = b),
    +               se = FALSE,
    +               control = list(maxiter = 100))
    

    Note: your code had start = list(a=a, b=b) which is a typo because a and b aren't defined in your formula. Aside from that, adding the *(1 + rnorm(length(x), sd=0.001)) is the only thing I did.

    The resulting graph made it seem like everything worked fine.

    Plot of results

    I'd generally recommend doing the fit separately, however, and then plotting it with predict. That way you can always check the quality of the fit to see if it worked before plotting.

    > fit <- nls(data=Y, formula = y ~ p1 + p2*log(x), start = list(p1 = a, p2 = b))
    > summary(fit)
    
    Formula: y ~ p1 + p2 * log(x)
    
    Parameters:
       Estimate Std. Error t value Pr(>|t|)    
    p1 3.001926   0.001538    1952 4.14e-13 ***
    p2 1.999604   0.001114    1795 5.78e-13 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.003619 on 4 degrees of freedom
    
    Number of iterations to convergence: 1 
    Achieved convergence tolerance: 1.623e-08
    
    > new_x = data.frame(x=seq(from=0.2, to=10, length.out=100))
    > ggplot(data=Y, aes(x=x, y=y)) + 
           geom_point() + 
           geom_line(data=new_x, 
                     aes(x=new_x, y=predict(fit, newdata=new_x)), 
                     color='blue')