Search code examples
rdataframeregressionlinear-regressionpanel-data

How to run linear regression model for each industry-year excluding firm i observations in R?


Here is the dput output of my dataset in R......

data1<-structure(list(Year = c(1998, 1999, 1999, 2000, 1996, 2001, 1998, 
1999, 2002, 1998, 2005, 1998, 1999, 1998, 1997, 1998, 2000), 
    `Firm name` = c("A", "A", "B", "B", "C", "C", "D", "D", "D", 
    "E", "E", "F", "F", "G", "G", "H", "H"), Industry = c("AUTO", 
    "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", "AUTO", 
    "Pharma", "Pharma", "Pharma", "Pharma", "Pharma", "Pharma", 
    "Pharma", "Pharma"), X = c(1, 2, 5, 6, 7, 9, 10, 11, 12, 
    13, 15, 16, 17, 18, 19, 20, 21), Y = c(30, 31, 34, 35, 36, 
    38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50), Z = c(23, 
    29, 47, 53, 59, 71, 77, 83, 89, 95, 107, 113, 119, 125, 131, 
    137, 143)), row.names = c(NA, -17L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = structure(c(`1` = 1L), class = "omit"))
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50), Z = c(23, 
29, 35, 41, 47, 53, 59, 65, 71, 77, 83, 89, 95, 101, 107, 113, 
119, 125, 131, 137, 143)), row.names = c(NA, -21L), class = c("tbl_df", 
"tbl", "data.frame"), na.action = structure(c(`1` = 1L), class = "omit"))

Here I am trying to regress Y~ X+Z for each industry year but excluding firm i observations.For each firm I want to estimate the linear regression model using all industry peer firms' observations but excluding firm's own observations.For example;for firm A, I want to regress Y~ X+Z by using all observations of its industry peer firms (B,C & D) across time but excluding firm A observations. Similarly I want to run model for firm B by using all observations of firm A,C & D (part of same industry as B) across time excluding firm B observations. And same procedure for firm C & D as well. I want to do this exercise for every firm within each industry. Please help.


Solution

  • As mentioned by @bonedi you can use a nested loop to accomplish this. If you want to create models for individual industry-year combinations, you will need to subset your data by Industry and Year. You can loop over Firm name and exclude that firm before creating the model. Results can be stored in a list, named by industry-year-firm. It's not a pretty solution but it should get you closer.

    lst <- list()
    
    for (ind in unique(data1$Industry)) {
      for (year in unique(data1[data1$Industry == ind, ]$Year)) {
        for (firm in unique(data1[data1$Industry == ind & data1$Year == year, ]$`Firm name`)) {
          sub_data <- data1[data1$Industry == ind & data1$Year == year & data1$`Firm name` != firm, ]
          if (nrow(sub_data) > 0) {
            name <- paste(ind, year, firm, sep = '-')
            lst[[name]] <- lm(Y ~ X + Z, data = sub_data)
          }
        }
      }
    }