Search code examples
rggplot2non-linear-regressionstatistics-bootstrap

Plot the median, confidence interval of a bootstrap output in ggplot2


I have a dataframe df (see below)

dput(df)
    structure(list(x = c(49, 50, 51, 52, 53, 54, 55, 56, 1, 2, 3, 
    4, 5, 14, 15, 16, 17, 2, 3, 4, 5, 6, 10, 11, 3, 30, 64, 66, 67, 
    68, 69, 34, 35, 37, 39, 2, 17, 18, 99, 100, 102, 103, 67, 70, 
    72), y = c(2268.14043972082, 2147.62290922552, 2269.1387550775, 
    2247.31983098201, 1903.39138268307, 2174.78291538358, 2359.51909126411, 
    2488.39004804939, 212.851575751527, 461.398994384333, 567.150629704352, 
    781.775113821961, 918.303706148872, 1107.37695799186, 1160.80594193377, 
    1412.61328924168, 1689.48879626486, 260.737164468854, 306.72700499362, 
    283.410379620422, 366.813913489692, 387.570173754128, 388.602676983443, 
    477.858510450125, 128.198042456082, 535.519377609133, 1028.8780498564, 
    1098.54431357711, 1265.26965941035, 1129.58344809909, 820.922447928053, 
    749.343583476846, 779.678206156474, 646.575242339517, 733.953282899613, 
    461.156280127354, 906.813018662913, 798.186995701282, 831.365377249207, 
    764.519073183124, 672.076289062505, 669.879217186302, 1341.47673353751, 
    1401.44881976186, 1640.27575962036)), .Names = c("x", "y"), row.names = c(NA, 
    -45L), class = "data.frame")

I have created on a non-linear regression (nls) based on my dataset.

nls1 <- nls(y~A*(x^B)*(exp(k*x)), 
            data = df, 
            start = list(A = 1000, B = 0.170, k = -0.00295), algorithm = "port")

I then computed a bootstrap for this function to get multiple sets of parameters (A,B and k).

library(nlstools)
Boo <- nlsBoot(nls1, niter = 200)

I now want to plot the median curve as well as the upper and lower confidence interval curves computed from the bootstrap object together in one ggplot2. The parameters (A,B and K) of each curve is contained in Boo_Gamma$bootCI. Can someone help me out with it? Thanks in advance.


Solution

  • AFAIK, package nlstools only returns the bootstrapped parameter estimates, not the predicted values...

    Hence, here is a quick solution, manually using the bootstrapped parameter estimates to compute the predictions and then recomputing the stats out of the predictions, since the model here is non-linear. It is not the most elegant, but it should do it :)

    # Matrix with the bootstrapped parameter estimates
    Theta_mat <- Boo$coefboot
    
    # Model
    fun <- function(x, theta) theta["A"] * (x ^ theta["B"]) * (exp(theta["k"] * x))
    
    # Points where to evaluate the model
    x_eval <- seq(min(df$x), max(df$x), length.out = 100)
    
    # Matrix with the predictions
    Pred_mat <- apply(Theta_mat, 1, function(theta) fun(x_eval, theta))
    
    # Pack the estimates for plotting
    Estims_plot <- cbind(
        x = x_eval, 
        as.data.frame(t(apply(Pred_mat, 1, function(y_est) c(
            median_est = median(y_est), 
            ci_lower_est = quantile(y_est, probs = 0.025, names = FALSE), 
            ci_upper_est = quantile(y_est, probs = 0.975, names = FALSE)
        ))))
    )
    
    library(ggplot2)
    ggplot(data = Estims_plot, aes(x = x, y = median_est, ymin = ci_lower_est, ymax = ci_upper_est)) + 
        geom_ribbon(alpha = 0.7, fill = "grey") + 
        geom_line(size = rel(1.5), colour = "black") + 
        geom_point(data = df, aes(x = x, y = y), size = rel(4), colour = "red", inherit.aes = FALSE) + 
        theme_bw() + labs(title = "Bootstrap results\n", x = "x", y = "y")
    ggsave("bootpstrap_results.pdf", height = 5, width = 9)
    

    Bootstrap results