I'm trying to apply an nls function to data by year, so there will be a separate nls function for each year. All the years are broadly similar (exponential decay), but some years the nls() function fails with a "singular gradient" error.
data that is working:
good_data = data.frame(y = c(8.46,6.87,5.81,6.62,5.85,5.79,4.83,4.94,4.95,5.27,5.05,5.38,5.08,3.98),
x = c(2,6,6,7,7,8,9,10,12,13,14,15,16,17))
data that is failing:
bad_data = data.frame(y = c(8.99,5.86,5.32,5.74,5.41,5.04,4.66,4.52,4.18,4.66,5.38,5.46,5.21,5.37,4.89),
x = c(2,6,6,7,7,8,9,10,11,12,13,14,15,16,17))
attempted nls:
fit = nls(y ~ SSasymp(x, Asym, R0, lrc), data = good_data)
To my eyes, the two sets of data look very similar. Is there some way I can diagnose why the one is failing and the other isn't? Is there something I can do to fix it?
Thanks
Below we show 2 approaches to this. If you want to do this automatically you might want to try a straight foward fit and if that fails then try (2) and if that fails try (1). If they all fail then the data may not really follow the model and should not be fit with it.
Another possibility that may avoid the iterative attempts at different methods if the data are all sufficiently similar is to fit all the data first and then fit each data set using the starting values from that. For that see (3).
1) If you add more points by doing a spline fit first then it converges:
sp <- with(bad_data, spline(x, y))
fit2sp <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = sp)
fit2sp
giving:
Nonlinear regression model
model: y ~ SSasymp(x, Asym, R0, lrc)
data: sp
Asym R0 lrc
5.0101 22.1915 -0.2958
residual sum-of-squares: 5.365
Number of iterations to convergence: 0
Achieved convergence tolerance: 1.442e-06
2) Another approach if the data are similar is to use the starting values from a prior successful fit.
fit1 <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = good_data)
fit2 <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = bad_data, start = coef(fit1))
fit2
giving:
Nonlinear regression model
model: y ~ SSasymp(x, Asym, R0, lrc)
data: bad_data
Asym R0 lrc
4.9379 15.5472 -0.7369
residual sum-of-squares: 2.245
Number of iterations to convergence: 10
Achieved convergence tolerance: 7.456e-06
Below we plot both solutions:
plot(y ~ x, bad_data)
points(y ~ x, sp, pch = 20)
lines(fitted(fit2sp) ~ x, sp, col = "red")
lines(fitted(fit2) ~ x, bad_data, col = "blue", lty = 2)
legend("topright", c("data", "spline", "fit2sp", "fit2"),
pch = c(1, 20, NA, NA), lty = c(NA, NA, 1, 2),
col = c("black", "black", "red", "blue"))
3) Another approach that may work if all the data is sufficiently similar is to fit all the data and then the individual data sets using the starting values from all the data.
all_data <- rbind(good_data, bad_data)
fitall <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = all_data)
fit1a <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = good_data, start = coef(fitall))
fit2a <- nls(y ~ SSasymp(x, Asym, R0, lrc), data = bad_data, start = coef(fitall))