Search code examples
rregressionlinear-regressionstatagoodness-of-fit

R-squared within for a regression with multiple fixed effects


I would like to get the R-squared within for a fixed effect regression with multiple fixed effects (let's say Country, Year, Trimester). The least squared dummy variable (LSDV) model (lm in R/ reg in Stata) would only provide the overall R-squared. The same is true if one uses areg (Stata). Code/packages suggestions for R or Stata are both welcome.


Solution

  • Consider this dummy data but this time we know exactly the coefficient we want to estimate.

    library(plm)
    library(xtable)
    library(texreg)
    library(data.table)
    set.seed(100)
    

    Let's first generate some data with time and individual fixed effect

    dt <- data.table(epsilon=rnorm(100),ind=rep(1:5,5),time=rep(1:5,each=5),x=rnorm(100,0,2))
    
    dt[,mu:=6*mean(x)*rnorm(20),ind]
    
    dt[,`:=`(delta=10*mean(x)+rnorm(20)),time]
    
    dt[,y:=5*x+mu+delta+epsilon]
    
    
    > ## head(dt)
    ##         epsilon ind time         x         mu    delta           y x..bari. x..bar.t
    ## 1: -0.247885286   1    1 3.1530482 -34.563268 37.74058 18.69467015 3.686294 3.854510
    ## 2:  1.234916664   2    1 4.2520514 -30.682143 39.75175 31.56477572 3.508577 3.854510
    ## 3:  0.117692498   3    1 2.2582500  44.240719 37.24573 92.89539109 3.936578 3.854510
    ## 4: -0.002265777   4    1 1.9168626 -48.510759 38.83342 -0.09529645 3.455571 3.854510
    ## 5: -1.424864120   5    1 1.7842555 -11.278647 38.77298 34.99074471 4.282104 3.854510
    ## 6: -1.441965687   1    2 0.5658582  -2.075256 38.41338 37.72545392 3.686294 3.737549 
    

    estimate model pooled ols

    pooled <- lm(y~x,data=dt)
    

    estimate model individual effects

    individual..effect <- lm(y~x+as.factor(ind),data=dt)
    

    estimate model time and individual effects

    individual..time..effect <- lm(y~x+as.factor(ind)+as.factor(time),data=dt)
    

    create variable mean over time and mean over individuals

    dt[,x..bari.:=mean(x),ind]
    dt[,x..bar.t:=mean(x),time]
    

    Estimate the within estimator

    within..estimator  <-  lm(y~I(x-x..bari.-x..bar.t),data=dt)
    

    Wrapping everything together

    screenreg(list(pooled,individual..effect,individual..time..effect
                   ,within..estimator))
    
    ## ==========================================================================
    ##                             Model 1     Model 2     Model 3     Model 4   
    ## --------------------------------------------------------------------------
    ## (Intercept)                   0.50        1.29        4.22 ***    0.73    
    ##                              (0.33)      (0.77)      (0.80)      (0.44)   
    ## x                             5.14 ***    5.18 ***    4.99 ***            
    ##                              (0.21)      (0.22)      (0.18)               
    ## as.factor(ind)2                          -1.23       -1.08                
    ##                                          (1.09)      (0.86)               
    ## as.factor(ind)3                          -0.97       -0.88                
    ##                                          (1.08)      (0.85)               
    ## as.factor(ind)4                          -0.95       -0.82                
    ##                                          (1.08)      (0.86)               
    ## as.factor(ind)5                          -0.80       -0.59                
    ##                                          (1.10)      (0.87)               
    ## as.factor(time)2                                     -3.88 ***            
    ##                                                      (0.85)               
    ## as.factor(time)3                                     -3.99 ***            
    ##                                                      (0.85)               
    ## as.factor(time)4                                     -1.39                
    ##                                                      (0.85)               
    ## as.factor(time)5                                     -5.94 ***            
    ##                                                      (0.85)               
    ## I(x - x..bari. - x..bar.t)                                        4.99 ***
    ##                                                                  (0.29)   
    ## --------------------------------------------------------------------------
    ## R^2                           0.86        0.86        0.92        0.75    
    ## Adj. R^2                      0.86        0.85        0.91        0.75    
    ## Num. obs.                   100         100         100         100       
    ## RMSE                          3.34        3.38        2.68        4.44    
    ## ==========================================================================
    ## *** p < 0.001, ** p < 0.01, * p < 0.05  
    

    I will let you explore with plm package if you wish to do so.