Search code examples
rnls

nls failing on some subsets of data, but not on other, similar, subsets


I'm trying to apply an nls function to data by year, so there will be a separate nls function for each year. All the years are broadly similar (exponential decay), but some years the nls() function fails with a "singular gradient" error.

data that is working:

good_data = data.frame(y = c(8.46,6.87,5.81,6.62,5.85,5.79,4.83,4.94,4.95,5.27,5.05,5.38,5.08,3.98),
                       x = c(2,6,6,7,7,8,9,10,12,13,14,15,16,17))

data that is failing:

bad_data = data.frame(y = c(8.99,5.86,5.32,5.74,5.41,5.04,4.66,4.52,4.18,4.66,5.38,5.46,5.21,5.37,4.89),
                      x = c(2,6,6,7,7,8,9,10,11,12,13,14,15,16,17))

attempted nls:

fit = nls(y ~ SSasymp(x, Asym, R0, lrc), data = good_data)

To my eyes, the two sets of data look very similar. Is there some way I can diagnose why the one is failing and the other isn't? Is there something I can do to fix it?

Thanks


Solution

  • Below we show 2 approaches to this. If you want to do this automatically you might want to try a straight foward fit and if that fails then try (2) and if that fails try (1). If they all fail then the data may not really follow the model and should not be fit with it.

    Another possibility that may avoid the iterative attempts at different methods if the data are all sufficiently similar is to fit all the data first and then fit each data set using the starting values from that. For that see (3).

    1) If you add more points by doing a spline fit first then it converges:

    sp <- with(bad_data, spline(x, y))
    fit2sp <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = sp)
    fit2sp
    

    giving:

    Nonlinear regression model
      model: y ~ SSasymp(x, Asym, R0, lrc)
       data: sp
       Asym      R0     lrc 
     5.0101 22.1915 -0.2958 
     residual sum-of-squares: 5.365
    
    Number of iterations to convergence: 0 
    Achieved convergence tolerance: 1.442e-06
    

    2) Another approach if the data are similar is to use the starting values from a prior successful fit.

    fit1 <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = good_data)
    fit2 <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = bad_data, start = coef(fit1))
    fit2
    

    giving:

    Nonlinear regression model
      model: y ~ SSasymp(x, Asym, R0, lrc)
       data: bad_data
       Asym      R0     lrc 
     4.9379 15.5472 -0.7369 
     residual sum-of-squares: 2.245
    
    Number of iterations to convergence: 10 
    Achieved convergence tolerance: 7.456e-06
    

    Below we plot both solutions:

    plot(y ~ x, bad_data)
    points(y ~ x, sp, pch = 20)
    lines(fitted(fit2sp) ~ x, sp, col = "red")
    lines(fitted(fit2) ~ x, bad_data, col = "blue", lty = 2)
    legend("topright", c("data", "spline", "fit2sp", "fit2"), 
      pch = c(1, 20, NA, NA), lty = c(NA, NA, 1, 2), 
      col = c("black", "black", "red", "blue"))
    

    3) Another approach that may work if all the data is sufficiently similar is to fit all the data and then the individual data sets using the starting values from all the data.

    all_data <- rbind(good_data, bad_data)
    fitall <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = all_data)
    fit1a <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = good_data, start = coef(fitall))
    fit2a <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = bad_data, start = coef(fitall))
    

    screenshot