I have data with climatological pressure values and the heights of the stations where the pressure was measured. I would like to fit an exponential model to them. The data look like this:
x
[1] 539 575 1320 438 1840 496 316 552 325 386 1599 1073 556 1029 1661
[16] 2472 1594 1197 910 1035 596 646 420 516 1980 1045 2287 440 419 1611
[31] 577 3580 484 1018 1669 745 1974 366 273 454 203 588 1427 405 1403
[46] 485 490 2106 990 3305 1078 455 300 1638 1708 438 1303 482 775 2502
[61] 457 2690 422 1638 555 426
y
[1] 954.1 951.4 867.2 964.0 813.3 958.8 979.7 950.8 978.4 971.3 835.1 894.1
[13] 952.0 904.4 833.3 751.5 839.0 882.5 912.0 899.4 947.1 942.3 968.5 961.9
[25] 801.3 893.6 769.8 965.6 965.1 836.9 949.2 653.6 959.8 900.2 830.6 928.6
[37] 800.3 971.1 983.5 963.4 992.6 947.5 848.3 969.4 858.2 959.9 959.3 787.2
[49] 900.4 677.6 893.2 962.7 981.5 834.9 827.0 966.0 870.1 961.1 925.2 749.3
[61] 962.8 734.0 968.0 836.3 950.4 966.5
I have first tried just to take the logarithms of the data and fit an lm
them:
log.p=log(y)
log.height=log(x)
lmlog=lm(log.p~log.height)
But as this delivered a model that does not fit at all, I've decided to use the nls
function having taken various tips from other posts (e.g. "start"):
f <- function(x,a,b) {a* exp(b * x)}
dat <- as.list(x, y)
start <- coef(nls(log(y) ~ log(f(x, a, b)), dat, start = c(a = 1, b = -1)))
nls=nls(y~ f(x,a,b), data=dat, start=start)
Unfortunately, even for "start", the following errors appear and I really don't know what to do anymore...
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
Can anyone help? Thanks in advance!!!
Transforming both variables doesn't linearize the relationship:
plot(log(y)~log(x))
Instead you should only transform y
:
plot(log(y)~x)
modlm <- lm(log(y)~x)
abline(modlm)
summary(modlm)
Call:
lm(formula = log(y) ~ x)
Residuals:
Min 1Q Median 3Q Max
-0.0081825 -0.0009194 0.0000952 0.0008455 0.0070058
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.927e+00 4.567e-04 15166.4 <2e-16 ***
x -1.227e-04 3.516e-07 -349.1 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.002185 on 64 degrees of freedom
Multiple R-squared: 0.9995, Adjusted R-squared: 0.9995
F-statistic: 1.219e+05 on 1 and 64 DF, p-value: < 2.2e-16
Of course you can use nls
too:
modnls <- nls(y~exp(a+b*x), start=list(a=coef(modlm)[[1]], b=coef(modlm)[[2]]))
plot(y~x)
xnew <- seq(min(x), max(x), by=0.5)
lines(xnew, exp(coef(modnls)[[1]]+xnew*coef(modnls)[[2]]))
summary(modnls)
Formula: y ~ exp(a + b * x)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 6.926e+00 4.384e-04 15797.5 <2e-16 ***
b -1.225e-04 3.831e-07 -319.7 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.904 on 64 degrees of freedom
Number of iterations to convergence: 2
Achieved convergence tolerance: 6.25e-08
Note how the parameter estimates of the linear and the nonlinear fit are very similar. Usually, you should choose whether to use lm
with transformed data or nls
with untransformed data depending on the error distribution. However, since the relationship between y
and x
is not very nonlinear, it doesn't matter so much here.