Search code examples
rregressionlinear-regressionlm

Linear Regression and group by in R


I want to do a linear regression in R using the lm() function. My data is an annual time series with one field for year (22 years) and another for state (50 states). I want to fit a regression for each state so that at the end I have a vector of lm responses. I can imagine doing for loop for each state then doing the regression inside the loop and adding the results of each regression to a vector. That does not seem very R-like, however. In SAS I would do a 'by' statement and in SQL I would do a 'group by'. What's the R way of doing this?


Solution

  • Here's one way using the lme4 package.

     library(lme4)
     d <- data.frame(state=rep(c('NY', 'CA'), c(10, 10)),
                     year=rep(1:10, 2),
                     response=c(rnorm(10), rnorm(10)))
    
     xyplot(response ~ year, groups=state, data=d, type='l')
    
     fits <- lmList(response ~ year | state, data=d)
     fits
    #------------
    Call: lmList(formula = response ~ year | state, data = d)
    Coefficients:
       (Intercept)        year
    CA -1.34420990  0.17139963
    NY  0.00196176 -0.01852429
    
    Degrees of freedom: 20 total; 16 residual
    Residual standard error: 0.8201316