I have two variables that I'm trying to model the relationship between and extract the residuals. The relationship between the two variables is clearly a non-linear exponential relationship. I've tried a few different approaches with nls, but I keep getting different error messages.
# dataset
df <- structure(list(y = c(464208.56, 334962.43, 361295.68, 426535.68, 258843.93, 272855.46,
166322.72, 244695.28, 227003.03, 190728.4, 156025.45, 72594.24, 56911.4, 175328.95, 161199.76,
152520.77, 190610.57, 60734.34, 31620.9, 74518.86, 45524.49, 2950.58, 2986.38, 15961.77, 12484.05,
6828.41, 2511.72, 1656.12, 5271.4, 7550.66, 3357.71, 3620.43, 3699.85, 3337.56, 4106.55, 3526.66,
2996.79, 1649.89, 4561.64, 1724.25, 3877.2, 4426.69, 8557.61, 6021.61, 6074.17, 4072.77, 4032.95,
5280.16, 7127.22),
x = c(39.23, 38.89, 38.63, 38.44, 38.32, 38.27, 38.3, 38.4, 38.56, 38.79, 39.06, 39.36, 39.68,
40.01, 40.34, 40.68, 41.05, 41.46, 41.93, 42.48, 43.14, 43.92, 44.84, 45.9, 47.1, 48.4, 49.78,
51.2, 52.62, 54.01, 55.31, 56.52, 57.6, 58.54, 59.33, 59.98, 60.46, 60.78, 60.94, 60.92, 60.71,
60.3, 59.69, 58.87, 57.86, 56.67, 55.33, 53.87, 52.33)),
row.names = c(NA, -49L),
class = c("tbl_df", "tbl", "data.frame"),
na.action = structure(c(`1` = 1L, `51` = 51L),
class = "omit"))
# initial model
m <- nls(y ~ a * exp(r * x),
start = list(a = 0.5, r = -0.2),
data = df)
Error in nls(y ~ a * exp(r * x), start = list(a = 0.5, r = -0.2), data = df, : singular gradient
# add term for alg
m <- nls(y ~ a * exp(r * x),
start = list(a = 0.5, r = -0.2),
data = df,
alg = "plinear")
Error in nls(y ~ a * exp(r * x), start = list(a = 0.5, r = -0.2), data = df, :
step factor 0.000488281 reduced below 'minFactor' of 0.000976562
As @Gregor Thomas suggests you could linearize your problem (fit a log-linear regression), at the cost of changing the error model. (Basic model diagnostics, i.e. a scale-location plot, suggest that this would be a much better statistical model!) However, you can do this efficiently without changing the error structure by fitting a log-link Gaussian GLM:
m1 <- glm(y ~ x, family = gaussian(link = "log"), data = df)
The model is y ~ Normal(exp(b0 + b1*x), s)
, so a = exp(b0)
, r = b1
.
I tried using list(a=exp(coef(m1)[1]), r=coef(m1)[2])
as starting values, but even this was too finicky for nls()
.
There are two ways to get nls
to work.
As @GregorThomas suggests, shifting the x-axis to x=38 also works fine (given a sensible starting value):
m <- nls(y ~ a * exp(r * (x-38)),
start = list(a = 3e5, r = -0.35),
data = df)
The deriv
function will generate a function with the right structure for nls
(returns the objective function, with a ".grad" attribute giving a vector of derivatives) if you ask it nicely. (I'm also using the exponentiated intercept from the log-Gaussian GLM as a starting value ...)
f <- deriv( ~ a*exp(r*x), c("a", "r"), function.arg = c("x", "a", "r"))
m2 <- nls(y ~ f(x, a, r),
start = list(a = exp(coef(m1)[1]), r = -0.35),
data = df)
We can plot these to compare the predictions (visually identical):
par(las = 1, bty = "l")
xvec <- seq(38, 60, length = 101)
plot(y ~ x, df)
lines(xvec, predict(m1, newdata = data.frame(x=xvec), type = "response"),
col = 2)
lines(xvec, predict(m, newdata = data.frame(x=xvec)), col = 4, lty = 2)
lines(xvec, predict(m2, newdata = data.frame(x=xvec)), col = 5, lty = 2)
With a little bit of extra work (exponentiating the intercept for the Gaussian GLM, shifting the x-origin back to zero for the nls
fit) we can compare the coefficients (only equal up to a tolerance of 2e-4 but that should be good enough, right?)
a1 <- exp(coef(m1)[[1]])
a2 <- coef(m)[[1]]*exp(-38*coef(m)[[2]])
all.equal(c(a = a1, r = coef(m)[[2]]),
c(a = a2, r = coef(m1)[[2]]), tolerance = 1e-4)
all.equal(c(a = a1, r = coef(m)[[2]]),
coef(m2), tolerance = 2e-4)