Disclaimer: This question is extremely related to this one I asked two days ago - but now it relates to the implementation of between and overall R2 in stargazer()
output not in summary()
as before.
Is there a way to get plm()
to calculate between R2 and overall R2 for me and include them in the stargazer()
output?
To clarify what I mean with between, overall, and within R2 see this answer on StackExchange.
My understanding is that plm only calculates within R2. I am running a Twoways effects Within Model.
library(plm)
library(stargazer)
# Create some random data
set.seed(1)
x=rnorm(100); fe=rep(rnorm(10),each=10); id=rep(1:10,each=10); ti=rep(1:10,10); e=rnorm(100)
y=x+fe+e
data=data.frame(y,x,id,ti)
# Get plm within R2
reg=plm(y~x,model="within",index=c("id","ti"), effect = "twoways", data=data)
stargazer(reg)
I now also want to include between and overall R2 in the stargazer()
output. How can I do that?
To make it explicit what I mean with between and overall R2:
# Pooled Version (overall R2)
reg1=lm(y~x)
summary(reg1)$r.squared
# Between R2
y.means=tapply(y,id,mean)[id]
x.means=tapply(x,id,mean)[id]
reg2=lm(y.means~x.means)
summary(reg2)$r.squared
To do this in stargazer
, you can use the add.lines()
argument. However, this adds the lines to the beginning of the summary stats section and there is no way to alter this without messing with the source code, which is beastly. I much prefer huxtable
, which provides a grammar of table building and is much more extensible and customizable.
library(tidyverse)
library(plm)
library(huxtable)
# Create some random data
set.seed(1)
x=rnorm(100); fe=rep(rnorm(10),each=10); id=rep(1:10,each=10); ti=rep(1:10,10); e=rnorm(100)
y=x+fe+e
data=data.frame(y,x,id,ti)
# Get plm within R2
reg=plm(y~x,model="within",index=c("id","ti"), effect = "twoways", data=data)
stargazer(reg, type = "text",
add.lines = list(c("Overall R2", round(r.squared(reg, model = "pooled"), 3)),
c("Between R2", round(r.squared(update(reg, effect = "individual", model = "between")), 3))))
#>
#> ========================================
#> Dependent variable:
#> ---------------------------
#> y
#> ----------------------------------------
#> x 1.128***
#> (0.113)
#>
#> ----------------------------------------
#> Overall R2 0.337
#> Between R2 0.174
#> Observations 100
#> R2 0.554
#> Adjusted R2 0.448
#> F Statistic 99.483*** (df = 1; 80)
#> ========================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
# I prefer huxreg, which is much more customizable!
# Create a data frame of the R2 values
r2s <- tibble(
name = c("Overall R2", "Between R2"),
value = c(r.squared(reg, model = "pooled"),
r.squared(update(reg, effect = "individual", model = "between"))))
tab <- huxreg(reg) %>%
# Add new R2 values
add_rows(hux(r2s), after = 4)
# Rename R2
tab[7, 1] <- "Within R2"
tab %>% huxtable::print_screen()
#> ─────────────────────────────────────────────────
#> (1)
#> ─────────────────────────
#> x 1.128 ***
#> (0.113)
#> ─────────────────────────
#> N 100
#> Overall R2 0.337
#> Between R2 0.174
#> Within R2 0.554
#> ─────────────────────────────────────────────────
#> *** p < 0.001; ** p < 0.01; * p < 0.05.
#>
#> Column names: names, model1
Created on 2020-04-08 by the reprex package (v0.3.0)