Search code examples
rloopsggplot2regressiongam

Loops through variable lists - embedded loops


I am a long-time STATA user finally trying to master R so that I can improve my graphics. In the following code, I fit a GAM to an outcome variable y1 using 5 exposure variables x1 - x5 and then plot the predictions for x1.

What I want to do is have two loops, one embedded inside the other so that the first loop iterates over the 5 outcomes, fitting the GAM for each and then, in the embedded second loop, it iterates over the 5 exposures, plotting the predictions for each. The result would be twenty five plots of 5 variables from each of five GAMs. In the real database, the variables aren't numbered, so it has to loop over the variable names as strings.

y1.gam <- mgcv::gam(y1~s(x1,bs="cr",fx=TRUE)+
                   s(x2,bs="cr",fx=TRUE)+
                   s(x3,bs="cr",fx=TRUE)+
                   s(x4,bs="cr",fx=TRUE)+
                   s(x5,bs="cr",fx=TRUE)+
                  family = poisson(link = "log"),
                  data = data)
y1.x1.plot <- plotGAM(gamFit = y1.gam , smooth.cov = "x1", groupCovs = NULL,
       plotCI=TRUE, orderedAsFactor = FALSE)

If it helps, here's how it would go in STATA:

global outcome y1 y2 y3 y4 y5
global exposure x1 x2 x3 x4 x5

foreach v of varlist $outcome {
    gam `v’ $exposure, …
    foreach w of varlist $exposure{
         plot `w’…
    }
}

Hope you can help.

Thanks.

Josh


Solution

  • Try this. Used mtcars as example dataset, even if it's for sure not an appropriate example dataset for this kind of model. However, I hope it's sufficient to show the general approach. The key of the loops is to use substitute to set up a formula object for the estimation step. The result is a list containing the models, the plots, the formulas, ...

    vars_outcome <- c("mpg", "disp")
    vars_exposure <- c("hp", "qsec")
    
    # Grid of outcome and exposure variables
    vars_grid <- expand.grid(out = vars_outcome, exp = vars_exposure, stringsAsFactors = FALSE)
    # Init list for formulas, models, plots
    mods <- list(out = vars_grid$out, exp = vars_grid$exp, fmla = list(), mod = list(), mod = list())
    
    for (i in seq_len(nrow(vars_grid))) {
      # Set up the formula
      mods$fmla[[i]] <- substitute(out ~ s(exp, bs="cr",fx=TRUE), list(out = as.name(mods$out[[i]]), exp = as.name(mods$exp[[i]])))
      # Estimate Model
      mods$mod[[i]] <- mgcv::gam(mods$fmla[[i]], family = poisson(link = "log"), data = mtcars)
      # Plot Model
      mods$plt[[i]] <- voxel::plotGAM(gamFit = mods$mod[[i]] , smooth.cov = mods$exp[[i]], groupCovs = NULL, plotCI=TRUE, orderedAsFactor = FALSE)
    
      # Create a "variable" containing the plot 
      assign(paste(mods$out[[i]], mods$exp[[i]], sep = "_"), mods$plt[[i]])
    
    }
    
    ## Name the list with the plots
    names(mods$plt) <- paste(mods$out, mods$exp, sep = "_")
    
    mods$fmla[[1]]
    #> mpg ~ s(hp, bs = "cr", fx = TRUE)
    mods$mod[[1]]
    #> 
    #> Family: poisson 
    #> Link function: log 
    #> 
    #> Formula:
    #> mpg ~ s(hp, bs = "cr", fx = TRUE)
    #> attr(,".Environment")
    #> <environment: R_GlobalEnv>
    #> 
    #> Estimated degrees of freedom:
    #> 9  total = 10 
    #> 
    #> UBRE score: -0.1518201
    mods$plt[[1]]
    

    Created on 2020-03-28 by the reprex package (v0.3.0)