Search code examples
rregressionanalysisbulk

Do regression analysis for all the variable X and response G, for all data frames found under one data frame in R


I have a data frame (df) which looks like this:

group.no Amount Response
1          5       10
1         10       25
1          2       20
2         12       20
2          4        8
2          3        5

and I have split the data.frame into several data.frames based on their group number with

  out <- split( df , f = df$group.no )

Now what I want is to do a regression analysis with lm between the amount ~ response for all the new data.frames in the "out" Please consider this is an example and I have 500 splitted data.frames in "out"


Solution

  • Assume the data shown reproducibly in the Note at the end. Specify pool = FALSE as an lmList argument if you don't want to pool the standard errors.

    # 1
    library(nlme)
    lmList(Response ~ Amount | group.no, DF)
    

    An alternative is:

    # 2
    lm(Response ~ grp / (Amount + 1) - 1, transform(DF, grp = factor(group.no)))
    

    or this which carries out completely separate regressions:

    # 3
    by(DF, DF$group.no, function(DF) lm(Response ~ Amount, DF))
    

    This last line can also be written:

    # 3a
    by(DF, DF$group.no, lm, formula = Response ~ Amount)
    

    R squared

    We can compute R squared by group using any of these:

    summary(lmList(Response ~ Amount | group.no, DF))$r.squared
    
    c(by(DF, DF$group.no, function(x) summary(lm(Response ~ Amount, x))$r.squared))
    
    reg.list <- by(DF, DF$group.no, lm, formula = Response ~ Amount)
    sapply(reg.list, function(x) summary(x)$r.squared)
    
    c(by(DF, DF$group.no, with, cor(Response, Amount)^2))
    
    library(dplyr)
    DF %>%
      group_by(group.no) %>%
      do(summarize(., r.squared = summary(lm(Response ~ Amount, .))$r.squared)) %>%
      ungroup
    

    Note

    Lines <- "group.no Amount Response
    1          5       10
    1         10       25
    1          2       20
    2         12       20
    2          4        8
    2          3        5"
    DF <- read.table(text = Lines, header = TRUE)