Search code examples
rnls

How to compute confidence interval of the fitted value via nls()


My data consist of two columns - time and cumulative number like below:

time <- c(1:14)
cum.num <- c(20, 45, 99, 195, 301, 407, 501, 582, 679, 753, 790, 861, 1011, 1441)

My non linear function is:

B/(B*C*exp(-A*B*time) + 1)

My objective is to model my data using non linear regression using nls() and to find confidence interval of the fitted value. I have tried the following

m1 <- nls(cum.num ~ B/((B*C)*exp(-A*B*time) + 1),start=list(A=0.001,B=1000,C=0.5))

I tried the following to compute the fitted value of my model:

predict(m1,interval="predict") 

I got only the fitted value without lower and upper confidence interval:

[1]  116.9912  145.7954  181.1951  224.4367  276.8663  339.8665  414.7550
[8]  502.6399  604.2369  719.6632  848.2417  988.3638 1137.4632 1292.1377

My questions are:

a) Is there any way can I compute lower and upper bound for the fitted values ? (Normally lm() function produce fitted value, lower, and upper bound by default)

b) Suppose I have new time like:

new.time<-c(15:20)

Can I compute the predicted value of cum.num at new.time along with lower and upper bound?

Thank You so much for your HELP!!!!


Solution

  • In your example, it seems that the model doesn't fit the data quite well, and the sample size is quite small. Normally, this means something going wrong, and you should modify your model before you doing any further analysis. But I still provide some way to calculate "confidence interval" through bootstrap method, although it may not be valid in this case.

    These are the data we need:

    time <- c(1:14)
    cum.num <- c(20, 45, 99, 195, 301, 407, 501, 582, 679, 753, 790, 861, 1011, 1441)
    new.time <- c(15:20)
    all.time <- c(time, new.time)
    

    We may give them other names, which are helpful for more general usage:

    y=cum.num # the dependent variable values from data
    x=time # the independent variable values from data
    new.x=all.time # the independent variable values over which we want to predict
    

    Here is the non-linear least-squares model used in this case, which is to be used in the equation but to be modified for use in the general case:

    nls(y ~ B/((B*C)*exp(-A*B*x) + 1), start=list(A=0.001,B=1000,C=0.5),
        control = nls.control(maxiter = 500, warnOnly = TRUE))
    

    Based on the model, we can define an estimate function to use to generate a vector of fitted values and predictions for each random generated index. The argument to the function should be some sample index, and in the function, a model based on the sample with the input index is fitted, and from the fitted model, a vector of fitted values and predictions are generated (since in the question a CI of fitted values and predictions are wanted).

    estimate <- function(ind){
        x <- x[ind]
        y <- y[ind]
        m1 <- nls(y ~ B/((B*C)*exp(-A*B*x) + 1), start=list(A=0.001,B=1000,C=0.5),
              control = nls.control(maxiter = 500, warnOnly = TRUE))
        predict(m1, newdata = list(x = new.x))
    }
    
    
    m1 <- nls(cum.num ~ B/((B*C)*exp(-A*B*time) + 1),start=list(A=0.001,B=1000,C=0.5))
    predict0 <- predict(m1, newdata = list(time = all.time))
    predict1 <- replicate(1000, estimate(sample.int(14, replace = TRUE)))
    intervals <- apply(predict1, 1, quantile, probs = c(0.05, 0.95))
    rbind(predict0, intervals)
    

    predict1 is a matrix to store the bootstrap result. Each bootstrap sample has the same size with the original sample (14 in this example), and the bootstrap sample is generated from the original sample with simple random sampling with replacement. So sample.int(14, replace = TRUE)) is used to generate the index for the bootstrap samples. And the estimate function is used to generate a vector of fitted values and predictions for each random generated index.

    Since predict1 is the bootstrapped fitted values and predictions, I calculate the 90% CI from the bootstrapped estimations. In the bootstrap procedure, there are a lot of warnings from nls function, which implies something wrong numerically, this is in accordance with the little sample size and the lack-of-fit model. And the final results looks like this:

    > rbind(predict0, intervals)
    [,1]      [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]
    predict0 116.99118 145.79538 181.1951 224.4367 276.8663 339.8665 414.7550 502.6399 604.2369
    5%        39.22272  67.34464 111.2190 173.7619 231.7736 289.7346 358.8469 436.2569 524.8187
    95%      162.92948 190.60295 224.2462 266.1298 314.1032 392.3228 504.1270 611.3698 704.2803
    [,10]    [,11]     [,12]     [,13]     [,14]     [,15]     [,16]     [,17]    [,18]
    predict0 719.6632 848.2417  988.3638 1137.4632 1292.1377 1448.4271 1602.2033 1749.5981 1887.374
    5%       627.1981 739.8984  822.7940  838.2366  846.9043  851.8955  854.2859  855.8558  856.873
    95%      799.1904 923.1220 1068.4667 1231.6091 1416.4405 1631.2212 1900.6581 2220.5415 2617.839
    [,19]     [,20]
    predict0 2013.1701 2125.5890
    5%        857.4619  857.8027
    95%      3072.8531 3594.9036
    > 
    

    Edit: Make some edits to improve the readability and to illustrate how to use the code for general usage based on @user3386170 's suggestion.