Search code examples
rregressionpanel-data

How to run a cross-sectional regression for each quarter and industry?


I need to run multiple regressions one for every Industry-quarter combination e.g. a regression for financial companies for every single time period e.g. 1999Q3, 1999Q4, 2000Q1, 2000Q2... etc but also utility companies and every retail company, and every food company etc.

I need to run the regression then collect all the coefficients from the regression into a list so then I can append the list back onto the original data frame so that I have the corresponding coefficients.

For example in the data set below, I would want to run the regression Y = x1 + x2 + x3, I have tried using for loops and nested loops and collecting the coefficients into a matrix but I can't seem to get it to work (I am an R novice!)

I have a panel data set that looks a like the following with the firm name, industry, calendar quarters and a few variables:

              `Company Name`  Industry  Quater           Y                x1               x2               x3
               <chr>          <chr>     <chr>            <dbl>            <dbl>            <dbl>            <dbl>            
              A & M FOOD SE  Food       1985Q1           2.97             16.4             9.23             2.22              
              A & M FOOD SE  Food       1985Q2           5.00             40.2             11.2             3.94              
              A & M FOOD SE  Food       1985Q3           5.71             40.7             12.5             4.66              
              A & M FOOD SE  Food       1985Q4           3.85             39.5             13.0             2.79              
              A & M FOOD SE  Food       1986Q1           3.12             38.9             13.2             1.98              
              A.A. IMPORTIN  Food       1985Q4           12.5             14.0             6.66             0.005             
              A.A. IMPORTIN  Food       1986Q1           13.3             15.0             6.74             0.513              
              A.A. IMPORTIN  Food       1986Q2           13.2             15.0             6.71             0.031             
              A.A. IMPORTIN  Food       1986Q3           13.5             15.2             6.86             0.111             
              C.D. JUMPINGS  Retail     1986Q4           13.1             14.6             7.46             0.241
              C.D. JUMPINGS  Retail     1985Q4           12.5             14.0             6.66             0.005             
              C.D. JUMPINGS  Retail     1986Q1           13.3             15.0             6.74             0.513              
              C.D. JUMPINGS  Retail     1986Q2           13.2             15.0             6.71             0.031             
              Kmart          Retail     1986Q3           13.5             15.2             6.86             0.111
              Kmart          Retail     1986Q4           13.1             14.6             7.46             0.241
              Kmart          Retail     1985Q4           12.5             14.0             6.66             0.005             
              Kmart          Retail     1986Q1           13.3             15.0             6.74             0.513              
              Kmart          Retail     1986Q2           13.2             15.0             6.71             0.031             
              Kmart          Retail     1986Q3           13.5             15.2             6.86             0.111             

Thank you guys so much I have tried using weird functions from the plm library to lapply.


Solution

  • A simple base R approach is with split. split divides a data.frame from the first argument into a list of data.frames based on the levels of the second argument. Thus, with your sample data, split(data,data$`Company Name`) will result in a list of 4 data.frames.

    From there, we can use lapply to apply the lm function to that subset of the data. Because lm has a lot of arguments, it's easier just to define a new function of x (called a lambda function).

    lapply(split(data,data$`Company Name`),
           function(x) lm( Y ~ x1 + x2 + x3, data = x))
    

    The formatting is a bit messy, so you could use sapply to simplify the results.

    t(sapply(split(data,data$`Company Name`),
             function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
             )
      )
    #              (Intercept)         x1            x2        x3
    #A & M FOOD SE   0.5773632 0.01586041 -3.662652e-05 0.9607874
    #A.A. IMPORTIN  -3.6117236 0.64295788  1.067509e+00 0.1410264
    #C.D. JUMPINGS   1.7123480 0.68601589  1.775447e-01 0.1964184
    #Kmart           0.2591970 0.78346288  1.880233e-01 0.0525099
    

    If you wanted to do this for two variables, Company Name and Quarter simply supply a list to split.

    t(sapply(split(data,list(data$`Company Name`, data$Quater)),
             function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
             )
      )
    

    I can't provide the output, because many of them are empty. Hopefully your dataset is complete. It should look something like this:

    t(sapply(Filter(function(x) nrow(x) > 0, split(data,list(data$`Company Name`, data$Quater))),
              function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
              )
       )
    #                     (Intercept) x1 x2 x3
    #A & M FOOD SE.1985Q1        2.97 NA NA NA
    #A & M FOOD SE.1985Q2        5.00 NA NA NA
    #A & M FOOD SE.1985Q3        5.71 NA NA NA
    #A & M FOOD SE.1985Q4        3.85 NA NA NA
    #A.A. IMPORTIN.1985Q4       12.50 NA NA NA
    #C.D. JUMPINGS.1985Q4       12.50 NA NA NA
    #Kmart.1985Q4               12.50 NA NA NA
    #A & M FOOD SE.1986Q1        3.12 NA NA NA
    #A.A. IMPORTIN.1986Q1       13.30 NA NA NA
    #C.D. JUMPINGS.1986Q1       13.30 NA NA NA
    #Kmart.1986Q1               13.30 NA NA NA
    #A.A. IMPORTIN.1986Q2       13.20 NA NA NA
    #C.D. JUMPINGS.1986Q2       13.20 NA NA NA
    #Kmart.1986Q2               13.20 NA NA NA
    #A.A. IMPORTIN.1986Q3       13.50 NA NA NA
    #Kmart.1986Q3               13.50 NA NA NA
    #C.D. JUMPINGS.1986Q4       13.10 NA NA NA
    #Kmart.1986Q4               13.10 NA NA NA
    

    Data

    data <- structure(list(`Company Name` = structure(c(1L, 1L, 1L, 1L, 1L, 
    2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A & M FOOD SE", 
    "A.A. IMPORTIN", "C.D. JUMPINGS", "Kmart"), class = "factor"), 
        Industry = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
        2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Food", 
        "Retail"), class = "factor"), Quater = structure(c(1L, 2L, 
        3L, 4L, 5L, 4L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 7L, 8L, 4L, 5L, 
        6L, 7L), .Label = c("1985Q1", "1985Q2", "1985Q3", "1985Q4", 
        "1986Q1", "1986Q2", "1986Q3", "1986Q4"), class = "factor"), 
        Y = c(2.97, 5, 5.71, 3.85, 3.12, 12.5, 13.3, 13.2, 13.5, 
        13.1, 12.5, 13.3, 13.2, 13.5, 13.1, 12.5, 13.3, 13.2, 13.5
        ), x1 = c(16.4, 40.2, 40.7, 39.5, 38.9, 14, 15, 15, 15.2, 
        14.6, 14, 15, 15, 15.2, 14.6, 14, 15, 15, 15.2), x2 = c(9.23, 
        11.2, 12.5, 13, 13.2, 6.66, 6.74, 6.71, 6.86, 7.46, 6.66, 
        6.74, 6.71, 6.86, 7.46, 6.66, 6.74, 6.71, 6.86), x3 = c(2.22, 
        3.94, 4.66, 2.79, 1.98, 0.005, 0.513, 0.031, 0.111, 0.241, 
        0.005, 0.513, 0.031, 0.111, 0.241, 0.005, 0.513, 0.031, 0.111
        )), class = "data.frame", row.names = c(NA, -19L))