Search code examples
rfor-looplinear-regression

How to loop a linear regression over multiple subsets of a factor variable


I'm trying to write a for loop that runs the same regression (same dependent and independent variables) 4 times separately over the 4 different levels of a factor variable. I then want to save the output of each linear regression Each level has approx ~ 500 rows of data.

My initial thought was to do something like this, but I am new to R and the different methods of iteration.

Regressionresults <- list()

for (i in levels(mydata$factorvariable)) {
  Regressionresults[[i]] <- lm(dependent ~ ., data = mydata)
}

I suspect that this is quite easy to do but I don't know how.

If you could also direct me to any help documentation or other resource where I can learn how to write these types of loops so I don't have to ask similar questions again, I'd be grateful.

Many thanks in advance!


Solution

  • The problems with the code in the question are:

    1. in R it is normally better not to use loops in the first place
    2. conventionally i is used for a sequential index so it is not a good choice of name to use for levels
    3. the body of the loop does not do any subsetting so it will assign the same result on each iteration
    4. posts to SO should have reproducible data and the question did not include that but rather referred to objects without defining their contents. Please read the instructions at the top of the tag page. Below we have used the built in iris data set for reproducibility.

    Here are some approaches using the builtin iris data frame for reproducibility. Each results in a named list where the names are the levels of Species.

    1) lm subset argument Map over the levels giving a list:

    sublm <- function(x) lm(Petal.Width ~ Sepal.Width, iris, subset = Species == x)
    levs <- levels(iris$Species)
    Map(sublm, levs)
    

    2) loop sublm and levs are from (1).

    L <- list()
    for(s in levs) L[[s]] <- sublm(s)
    

    3) nlme or use lmList from nlme

    library(nlme)
    L3 <- lmList(Petal.Width ~ Sepal.Width | Species, iris)
    coef(L3)
    summary(L3)