Search code examples
rlinear-regression

Simple linear regression in R with many x varibales and one y. Only write one model and not for each x and y combination?


I would like to analyse many x variables (400 variables) against one y variable (1 variable). However I do not want to write for each and every x variable a new model. Is it possible to write one model which than checks all x variables with y in R-Studio?


Solution

  • If your data frame is DF,

    regs <- list()
    for (v in setdiff(names(DF), "y")) {
      fm <- eval(parse(text = sprintf("y ~ %s", v)))
      regs[[v]] <- lm(fm, data=DF)
    }
    

    Now you have all simple regression results in the regs list.

    Example:

    ## Generate data
    n <- 1000
    set.seed(1)
    DF <- data.frame(y = rnorm(n))
    for (j in seq(400)) DF[[paste0('x',j)]] <- rnorm(n)
    ## Now data ready
    
    dim(DF)
    # [1] 1000 401
    head(names(DF))
    # [1] "y"  "x1" "x2" "x3" "x4" "x5"
    tail(names(DF))
    # [1] "x395" "x396" "x397" "x398" "x399" "x400"
    
    regs <- list()
    for (v in setdiff(names(DF), "y")) {
      fm <- eval(parse(text = sprintf("y ~ %s", v)))
      regs[[v]] <- lm(fm, data=DF)
    }
    
    head(names(regs))
    # [1] "x1" "x2" "x3" "x4" "x5" "x6"
    
    r2s <- sapply(regs, function(x) summary(x)$r.squared)
    head(r2s, 3)
    #           x1           x2           x3 
    # 0.0000409755 0.0024376111 0.0005509134