Search code examples
systempanelequation

systemfit 3SLS Testing for Overidentification Restrictions


currently I'm struggling to find a good way to perform the Hansen/Sargan tests of Overidentification restrictions within a Three-Stage Least Squares model (3SLS) in panel data using R. I was digging the whole day in different networks and couldn't find a way of depicting the tests in R using the well-known systemfit package.

Currently, my code is simple.

    
violence_c_3sls <- Crime ~ ln_GDP +I(ln_GDP^2) + ln_Gini 
income_c_3sls  <-ln_GDP  ~ Crime + ln_Gini 
gini_c_3sls <- ln_Gini ~ ln_GDP + I(ln_GDP^2) + Crime 

inst <- ~ Educ_Gvmnt_Exp + I(Educ_Gvmnt_Exp^2)+ Health_Exp + Pov_Head_Count_1.9 

system_c_3sls <- list(violence_c_3sls, income_c_3sls, gini_c_3sls)

fitsur_c_3sls <-systemfit(system_c_3sls, "3SLS",inst=inst, data=df_new, methodResidCov = "noDfCor" )
summary(fitsur_c_3sls)

However, adding more instruments to create an over-identified system do not yield in an output of the Hansen/Sargan test, thus I assume the test should be executed aside from the output and probably associated to systemfit class object.

Thanks in advance.


Solution

  • With g equations, l exogenous variables, and k regressors, the Sargan test for 3SLS isSargan test

    where u is the stacked residuals, \Sigma is the estimated residual covariance, and P_W is the projection matrix on the exogenous variables. See Ch 12.4 from Davidson & MacKinnon ETM.

    Calculating the Sargan test from systemfit should look something like this:

    sargan.systemfit=function(results3sls){
      result <- list()
    
      u=as.matrix(resid(results3sls)) #model residuals, n x n_eq
      n_eq=length(results3sls$eq) # number of equations
      n=nrow(u) #number of observations
      n_reg=length(coef(results3sls)) # total number of regressors
      
      w=model.matrix(results3sls,which='z') #Matrix of instruments, in block diagonal form with one block per equation
      
      #Need to aggregate into a single block (in case different instruments used per equation)
      w_list=lapply(X = 1:n_eq,FUN = function(eq_i){
        this_eq_label=results3sls$eq[[eq_i]]$eqnLabel
        this_w=w[str_detect(rownames(w),this_eq_label),str_detect(colnames(w),this_eq_label)]
        colnames(this_w)=str_remove(colnames(this_w),paste0(this_eq_label,'_'))
        return(this_w)
      })
      w=do.call(cbind,w_list)
      w=w[,!duplicated(colnames(w))]
      n_inst=ncol(w) #w is n x n_inst, where n_inst is the number of unique instruments/exogenous variables
      
      #estimate residual variance (or use residCov, should be asymptotically equivalent)
      var_u=crossprod(u)/n #var_u=results3sls$residCov
      
      P_w=w%*%solve(crossprod(w))%*%t(w) #Projection matrix on instruments w
      
    
      #as.numeric(u) vectorizes the residuals into a n_eq*n x 1 vector.
      
      result$statistic <- as.numeric(t(as.numeric(u))%*%kronecker(solve(var_u),P_w)%*%as.numeric(u))
    
      result$df <- n_inst*n_eq-n_reg
      
      result$p.value <- 1 - pchisq(result$statistic, result$df)
      result$method = paste("Sargan over-identifying restrictions test")
    
      return(result)
      
      
    }