running multiple lines of R code that differ by a single variable each time to improve readability

I am looking to improve the readability of my code by seeing if there is a way to "loop" or "re-run" lines of code that are very similar but differ by a single variable each time.

My actual data analyses involves running a number of blmer calls from the blme package. Each of my analyses has a dependent variable, an independent variable (of which there are many), a "wave" variable (as data was collected over 3 timepoints), and unique participant id as a random effect.

I'm trying to build a number of models, all of which are very similar, but each differs on what is entered as the independent variable.

In the below code, I have outlined some more details, built a new, fictitious, data file, and tried to recreate models similar to those in my actual file.

The code runs without problem on my real data and here in the fictitious data. What I'd like to draw attention to here is how even with just 3 models included (as is the case in my example below) the code begins to become long and repetitive.

##test script##
library(dplyr)
library(tidyverse)
library(blme)
#packages loaded - I'm not sure these three are exactly needed, I just loaded
#dplyr and tidyverse incase...but blme is for the Bayesian models coming later
#everything below worked on RStudio on my end but, I like I say, I don't 
#know if that is because of the above packages or not...

##build a file
DV0 <- c(100, 50, 75, 80, 20, 30) #let's say performance on a soccer task at time 1 - max 100
DV1 <- c(100, 60, 80, 80, 25, 40) #performance on soccer task at time 2
DV2 <- c(95, 55, 70, 70, 20, 35) #performance on soccer task at time 3
IV1.0 <- c(90, 60, 65, 75, 40, 50) #score on cognitive task A at time 1 - max 100
IV1.1 <- c(95, 70, 75, 80, 50, 70) #score on cog task A at time 2 
IV1.2 <- c(90, 55, 60, 70, 45, 60) #score on cog task A at time 3
IV2.0 <- c(10, 40, 50, 60, 20, 25) #score on cognitive task B at time 1 - max 100
IV2.1 <- c(20, 50, 60, 75, 35, 35) #score on cog task B at time 2
IV2.2 <- c(15, 40, 40, 55, 25, 25) #score on cos task B at time 3
id <- c("Jon", "Sara", "Lisa", "Tim", "Joe", "Paul")

##create a data frame before pivot to a better format for longitudinal data
df <- data.frame(DV0, DV1, DV2, IV1.0, IV1.1, IV1.2, IV2.0, IV2.1, IV2.2,
                 id)
df.long <- long_panel(df, begin = 0, end = 2, label_location = "end")

#now onto the main analyses 
#here I want to use "blmer" from "blme" package to understand how performance
#on the soccer task first is affected by time alone (model1 below). 
#Next,I want to check whether adding performance on cognitive task A
#influences performance (model2 below), before running the same analyses but with
#cognitive task B (model3 below) - in this example I have just two cognitive 
#tasks, but in my real work I have many more IVs to test (let's in this case 
#just say it would be more cognitive tasks). Final thing I plan to add an 
#individual slope and intercept based on the id variable

#time alone and soccer task performance
model1 <- blmer(DV ~ wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model1)

#new experimental model with cognitive tasks A performance added
model2 <- blmer(DV ~ IV1. + wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model2)
anova(model1, model2)

#a similar experimental model with cogntive tasks B performance instead of A
model3 <- blmer(DV ~ IV2. + wave + (1 | id), data = df.long, REML = FALSE,
                fixef.prior = normal)
summary(model3)
anova(model1, model3)

#in the real data I then have many more models with IV1. or IV2. changed for 
#another independent variable (e.g., IV3. or IV4.) and as a result the code
#is very long. I'm wanting to know, can the above be put together in fewer 
#lines of code. What I've been reading is maybe that I could loop somewhere
#so that "IV.*" is replaced each time?

#thanks in advance for any help!

So, if you have any ways to essentially run the code for model1, model2, and model3 in this example if fewer lines of code, that would be great.

Solution

You can create a function that receives the independent variable as a string, plus the df, and other options, and leverages as.formula(). Then apply the function to each of the your independent variables using lapply(). You can use "" as the "independent variable", when running the wave-only model (i.e. model 1).

get_model <- function(ind_var, df, REML = FALSE,fixef.prior = "normal",...) {
  f <- as.formula(paste0("DV ~ ",ind_var, " + wave + (1 | id)"))
  blmer(f, data = df, REML = REML,fixef.prior = fixef.prior,...)
}

Now get a list called models

models = lapply(c("", "IV1.", "IV2."), get_model, df=df.long)

You can run any anova you like, like this:

anova(models[[1]], models[[3]])

Output:

Data: df
Models:
models[[1]]: DV ~ +wave + (1 | id)
models[[3]]: DV ~ IV2. + wave + (1 | id)
            npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)   
models[[1]]    4 141.41 144.98 -66.707   133.41                        
models[[3]]    5 133.12 137.57 -61.560   123.12 10.296  1   0.001333 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

There is another option, which is to make df.long even "longer", and then estimate the models by the grouping variable. Here is an example of doing that with data.table

library(data.table)
setDT(df.long)

df.longer=melt(df.long, measure=c("IV1.", "IV2."),variable.name = "ind_var")

rbind(
  df.long[, .(model=list(blmer(DV~wave+(1|id), REML=F, fixef.prior="normal")))][, ind_var:="None"],
  df.longer[, .(model=list(blmer(DV~value+wave+(1|id), REML=F, fixef.prior="normal"))), ind_var]
)

Output is a data.table of models

            model ind_var
           <list>  <fctr>
1: <blmerMod[14]>    None
2: <blmerMod[14]>    IV1.
3: <blmerMod[14]>    IV2.