Search code examples
rdplyrlinear-regressionlmpanel-data

Cross-sectional regression table output with number of positive and negative significant terms


I have a time-series panel dataset which is structured in the following way:

 df <- data.frame(
      year = c(2012L, 2013L, 2014L, 2012L, 2013L, 2014L, 2015L),
      id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L),
      col1 = c(11L, 13L, 13L, 16L, 15L, 15L, 16L),
       col2 = c(10L, 14L, 12L, 13L, 11L, 16L, 17L),
    col3 = c(17L, 12L, 12L, 14L, 19L, 21L, 12L),
    )
    > df
      year id col1 col2 col3
    1 2012  1   11   10   17
    2 2013  1   13   14   12
    3 2014  1   13   12   12
    4 2012  2   16   13   14
    5 2013  2   15   11   19
    6 2014  2   15   16   21
    7 2015  2   16   17   12
    > 

I would like to group the panel data by year and run a cross-sectional regression of one column on the other columns. This is the code I have so far:

reg =
  df %>%
  group_by(year) %>%
  do({model = lm(col1 ~  col2 + col3, data=.)    # create your model
  data.frame(tidy(model),              # get coefficient info
             glance(model))})

reg_results =
  reg %>%
  select(year, term, estimate, adj.r.squared) %>%
  spread(term, estimate)

stargazer(as.data.frame(reg_results), title = "Full regression", 
          type = "text", nobs = TRUE, mean.sd = TRUE, median = TRUE,
          iqr = TRUE)

In my final desired output I would like to have the average of each coefficient and I would also like to have a column for the number of positive and significant observations and the number of negative and significant observations for each regression coefficient. Something similar to the table in the following picture: https://i.sstatic.net/stOUw.jpg


Solution

  • After grouping by year, you can summarise the results of lm as a list-column, then unnest, group by term, and define the summary statistics you're interested in:

    library(tidyverse)
    library(broom)
    
    df %>% 
      group_by(year) %>%
      summarise(model = list(lm(col1 ~  col2 + col3, data = .) %>% tidy)) %>% 
      unnest(model) %>% 
      filter(term != "(Intercept)") %>% 
      group_by(term) %>% 
      summarise(avg_coef = mean(estimate),
                n_pos_sig = sum(estimate > 0 & p.value < .05),
                n_neg_sig = sum(estimate < 0 & p.value < .05))
    

    Output

    # A tibble: 2 x 4
      term  avg_coef n_pos_sig n_neg_sig
      <chr>    <dbl>     <int>     <int>
    1 col2    0.464          0         0
    2 col3    0.0682         0         0