My data consist of two columns - time and cumulative number like below:
time <- c(1:14)
cum.num <- c(20, 45, 99, 195, 301, 407, 501, 582, 679, 753, 790, 861, 1011, 1441)
My non linear function is:
B/(B*C*exp(-A*B*time) + 1)
My objective is to model my data using non linear regression using nls()
and to find confidence interval of the fitted value. I have tried the following
m1 <- nls(cum.num ~ B/((B*C)*exp(-A*B*time) + 1),start=list(A=0.001,B=1000,C=0.5))
I tried the following to compute the fitted value of my model:
predict(m1,interval="predict")
I got only the fitted value without lower and upper confidence interval:
[1] 116.9912 145.7954 181.1951 224.4367 276.8663 339.8665 414.7550
[8] 502.6399 604.2369 719.6632 848.2417 988.3638 1137.4632 1292.1377
My questions are:
a) Is there any way can I compute lower and upper bound for the fitted values ? (Normally lm()
function produce fitted value, lower, and upper bound by default)
b) Suppose I have new time like:
new.time<-c(15:20)
Can I compute the predicted value of cum.num
at new.time
along with lower and upper bound?
Thank You so much for your HELP!!!!
In your example, it seems that the model doesn't fit the data quite well, and the sample size is quite small. Normally, this means something going wrong, and you should modify your model before you doing any further analysis. But I still provide some way to calculate "confidence interval" through bootstrap method, although it may not be valid in this case.
These are the data we need:
time <- c(1:14)
cum.num <- c(20, 45, 99, 195, 301, 407, 501, 582, 679, 753, 790, 861, 1011, 1441)
new.time <- c(15:20)
all.time <- c(time, new.time)
We may give them other names, which are helpful for more general usage:
y=cum.num # the dependent variable values from data
x=time # the independent variable values from data
new.x=all.time # the independent variable values over which we want to predict
Here is the non-linear least-squares model used in this case, which is to be used in the equation but to be modified for use in the general case:
nls(y ~ B/((B*C)*exp(-A*B*x) + 1), start=list(A=0.001,B=1000,C=0.5),
control = nls.control(maxiter = 500, warnOnly = TRUE))
Based on the model, we can define an estimate
function to use to generate a vector of fitted values and predictions for each random generated index. The argument to the function should be some sample index, and in the function, a model based on the sample with the input index is fitted, and from the fitted model, a vector of fitted values and predictions are generated (since in the question a CI of fitted values and predictions are wanted).
estimate <- function(ind){
x <- x[ind]
y <- y[ind]
m1 <- nls(y ~ B/((B*C)*exp(-A*B*x) + 1), start=list(A=0.001,B=1000,C=0.5),
control = nls.control(maxiter = 500, warnOnly = TRUE))
predict(m1, newdata = list(x = new.x))
}
m1 <- nls(cum.num ~ B/((B*C)*exp(-A*B*time) + 1),start=list(A=0.001,B=1000,C=0.5))
predict0 <- predict(m1, newdata = list(time = all.time))
predict1 <- replicate(1000, estimate(sample.int(14, replace = TRUE)))
intervals <- apply(predict1, 1, quantile, probs = c(0.05, 0.95))
rbind(predict0, intervals)
predict1
is a matrix to store the bootstrap result.
Each bootstrap sample has the same size with the original sample (14 in this example), and the bootstrap sample is generated from the original sample with simple random sampling with replacement. So sample.int(14, replace = TRUE))
is used to generate the index for the bootstrap samples. And the estimate
function is used to generate a vector of fitted values and predictions for each random generated index.
Since predict1
is the bootstrapped fitted values and predictions, I calculate the 90% CI from the bootstrapped estimations. In the bootstrap procedure, there are a lot of warnings from nls
function, which implies something wrong numerically, this is in accordance with the little sample size and the lack-of-fit model. And the final results looks like this:
> rbind(predict0, intervals)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
predict0 116.99118 145.79538 181.1951 224.4367 276.8663 339.8665 414.7550 502.6399 604.2369
5% 39.22272 67.34464 111.2190 173.7619 231.7736 289.7346 358.8469 436.2569 524.8187
95% 162.92948 190.60295 224.2462 266.1298 314.1032 392.3228 504.1270 611.3698 704.2803
[,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
predict0 719.6632 848.2417 988.3638 1137.4632 1292.1377 1448.4271 1602.2033 1749.5981 1887.374
5% 627.1981 739.8984 822.7940 838.2366 846.9043 851.8955 854.2859 855.8558 856.873
95% 799.1904 923.1220 1068.4667 1231.6091 1416.4405 1631.2212 1900.6581 2220.5415 2617.839
[,19] [,20]
predict0 2013.1701 2125.5890
5% 857.4619 857.8027
95% 3072.8531 3594.9036
>
Edit: Make some edits to improve the readability and to illustrate how to use the code for general usage based on @user3386170 's suggestion.