I need to run multiple regressions one for every Industry-quarter combination e.g. a regression for financial companies for every single time period e.g. 1999Q3, 1999Q4, 2000Q1, 2000Q2... etc but also utility companies and every retail company, and every food company etc.
I need to run the regression then collect all the coefficients from the regression into a list so then I can append the list back onto the original data frame so that I have the corresponding coefficients.
For example in the data set below, I would want to run the regression Y = x1 + x2 + x3, I have tried using for loops and nested loops and collecting the coefficients into a matrix but I can't seem to get it to work (I am an R novice!)
I have a panel data set that looks a like the following with the firm name, industry, calendar quarters and a few variables:
`Company Name` Industry Quater Y x1 x2 x3
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
A & M FOOD SE Food 1985Q1 2.97 16.4 9.23 2.22
A & M FOOD SE Food 1985Q2 5.00 40.2 11.2 3.94
A & M FOOD SE Food 1985Q3 5.71 40.7 12.5 4.66
A & M FOOD SE Food 1985Q4 3.85 39.5 13.0 2.79
A & M FOOD SE Food 1986Q1 3.12 38.9 13.2 1.98
A.A. IMPORTIN Food 1985Q4 12.5 14.0 6.66 0.005
A.A. IMPORTIN Food 1986Q1 13.3 15.0 6.74 0.513
A.A. IMPORTIN Food 1986Q2 13.2 15.0 6.71 0.031
A.A. IMPORTIN Food 1986Q3 13.5 15.2 6.86 0.111
C.D. JUMPINGS Retail 1986Q4 13.1 14.6 7.46 0.241
C.D. JUMPINGS Retail 1985Q4 12.5 14.0 6.66 0.005
C.D. JUMPINGS Retail 1986Q1 13.3 15.0 6.74 0.513
C.D. JUMPINGS Retail 1986Q2 13.2 15.0 6.71 0.031
Kmart Retail 1986Q3 13.5 15.2 6.86 0.111
Kmart Retail 1986Q4 13.1 14.6 7.46 0.241
Kmart Retail 1985Q4 12.5 14.0 6.66 0.005
Kmart Retail 1986Q1 13.3 15.0 6.74 0.513
Kmart Retail 1986Q2 13.2 15.0 6.71 0.031
Kmart Retail 1986Q3 13.5 15.2 6.86 0.111
Thank you guys so much I have tried using weird functions from the plm library to lapply.
A simple base R approach is with split
. split
divides a data.frame
from the first argument into a list of data.frame
s based on the levels of the second argument. Thus, with your sample data
, split(data,data$`Company Name`)
will result in a list of 4 data.frame
s.
From there, we can use lapply
to apply the lm
function to that subset of the data. Because lm
has a lot of arguments, it's easier just to define a new function of x
(called a lambda function).
lapply(split(data,data$`Company Name`),
function(x) lm( Y ~ x1 + x2 + x3, data = x))
The formatting is a bit messy, so you could use sapply
to simplify the results.
t(sapply(split(data,data$`Company Name`),
function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
)
)
# (Intercept) x1 x2 x3
#A & M FOOD SE 0.5773632 0.01586041 -3.662652e-05 0.9607874
#A.A. IMPORTIN -3.6117236 0.64295788 1.067509e+00 0.1410264
#C.D. JUMPINGS 1.7123480 0.68601589 1.775447e-01 0.1964184
#Kmart 0.2591970 0.78346288 1.880233e-01 0.0525099
If you wanted to do this for two variables, Company Name
and Quarter
simply supply a list
to split
.
t(sapply(split(data,list(data$`Company Name`, data$Quater)),
function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
)
)
I can't provide the output, because many of them are empty. Hopefully your dataset is complete. It should look something like this:
t(sapply(Filter(function(x) nrow(x) > 0, split(data,list(data$`Company Name`, data$Quater))),
function(x) lm( Y ~ x1 + x2 + x3, data = x)$coefficients
)
)
# (Intercept) x1 x2 x3
#A & M FOOD SE.1985Q1 2.97 NA NA NA
#A & M FOOD SE.1985Q2 5.00 NA NA NA
#A & M FOOD SE.1985Q3 5.71 NA NA NA
#A & M FOOD SE.1985Q4 3.85 NA NA NA
#A.A. IMPORTIN.1985Q4 12.50 NA NA NA
#C.D. JUMPINGS.1985Q4 12.50 NA NA NA
#Kmart.1985Q4 12.50 NA NA NA
#A & M FOOD SE.1986Q1 3.12 NA NA NA
#A.A. IMPORTIN.1986Q1 13.30 NA NA NA
#C.D. JUMPINGS.1986Q1 13.30 NA NA NA
#Kmart.1986Q1 13.30 NA NA NA
#A.A. IMPORTIN.1986Q2 13.20 NA NA NA
#C.D. JUMPINGS.1986Q2 13.20 NA NA NA
#Kmart.1986Q2 13.20 NA NA NA
#A.A. IMPORTIN.1986Q3 13.50 NA NA NA
#Kmart.1986Q3 13.50 NA NA NA
#C.D. JUMPINGS.1986Q4 13.10 NA NA NA
#Kmart.1986Q4 13.10 NA NA NA
Data
data <- structure(list(`Company Name` = structure(c(1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A & M FOOD SE",
"A.A. IMPORTIN", "C.D. JUMPINGS", "Kmart"), class = "factor"),
Industry = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Food",
"Retail"), class = "factor"), Quater = structure(c(1L, 2L,
3L, 4L, 5L, 4L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 7L, 8L, 4L, 5L,
6L, 7L), .Label = c("1985Q1", "1985Q2", "1985Q3", "1985Q4",
"1986Q1", "1986Q2", "1986Q3", "1986Q4"), class = "factor"),
Y = c(2.97, 5, 5.71, 3.85, 3.12, 12.5, 13.3, 13.2, 13.5,
13.1, 12.5, 13.3, 13.2, 13.5, 13.1, 12.5, 13.3, 13.2, 13.5
), x1 = c(16.4, 40.2, 40.7, 39.5, 38.9, 14, 15, 15, 15.2,
14.6, 14, 15, 15, 15.2, 14.6, 14, 15, 15, 15.2), x2 = c(9.23,
11.2, 12.5, 13, 13.2, 6.66, 6.74, 6.71, 6.86, 7.46, 6.66,
6.74, 6.71, 6.86, 7.46, 6.66, 6.74, 6.71, 6.86), x3 = c(2.22,
3.94, 4.66, 2.79, 1.98, 0.005, 0.513, 0.031, 0.111, 0.241,
0.005, 0.513, 0.031, 0.111, 0.241, 0.005, 0.513, 0.031, 0.111
)), class = "data.frame", row.names = c(NA, -19L))