Search code examples
rregressiondifferencepanel-data

Compute the difference between two different regressions in R


I am trying to compute the coefficient and the respective p-value for the variation of the variable "Efficiency Ratio" (ER) between 2014 and 2011 (i.e., ∆=2014-2011). I have already computed individual regressions for 2011 and 2014 using the pooled OLS. I don't know the next step to compute the result of this variation (estimated coefficient and p-value). I am trying to conclude whether the sensitivity of the dependent variable concerning the ER variable has been less positively correlated.

Below, I present the individual regressions for 2011 and 2014, and part of my database. I would appreciate any insights on how to do this in R. Thank you.


pdata2011<-pdata.frame(paneldata2011, index = c("BANKS","YEARS"))

pooled2011<-plm(VCTC ~ ER + log(TA) + log(GDP), data = pdata2011,  model = "pooling")


pdata2014<-pdata.frame(paneldata2014, index = c("BANKS","YEARS"))

pooled2014<-plm(VCTC ~ ER + log(TA) + log(GDP), data = pdata2014, model = "pooling")
  BANKS YEARS    VCTC         ER           TA         GDP
    1   2014    0.00000000  0.8559100   235193.8    534678.1
    1   2011    0.16887878  1.5123620   301355.0    522645.5
    2   2014    0.87297022  0.6225519   809343.3    1801480.1
    2   2011    0.85148515  0.6321466   777083.1    1789140.7
    3   2014    0.24422236  0.4315355   2573915.1   10438529.2
    3   2011    0.24970615  0.4156023   1853465.0   7551500.4
    4   2014    0.33174224  0.3927662   2457455.2   10438529.2
    4   2011    0.28012834  0.4291702   1877624.1   7551500.4
    5   2014    0.31638913  0.3525573   2697975.7   10438529.2
    5   2011    0.32945877  0.3633482   1949372.7   7551500.4
    6   2014    0.22575998  0.3450020   3320881.7   10438529.2
    6   2011    0.21708543  0.3596391   2456488.5   7551500.4
...
    34  2014    0.94692763  0.7477073   274119.0    17521746.5
    34  2011    0.93822571  0.7259823   216827.0    15542581.1
    35  2014    0.86932004  0.5752208   1687155.0   17521746.5
    35  2011    0.85889245  0.6049802   1313867.0   15542581.1

Solution

  • You can do as @LynnL proposed, which is to include an interaction term. If there are significant differences (i.e not zero) between the effects of ER, this term will give you a small p-value.

    Ideally provide the data next time, because we have no idea what are YEARS etc and whether the two data.frames can be combined. Below I suggest using a zscore to look at the difference between 2 coefficients:

    library(plm)
    

    Using the first 12 rows of your data, assuming this is the combined dataset:

        df = structure(list(BANKS = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 
    5L, 6L, 6L), YEARS = c(2014L, 2011L, 2014L, 2011L, 2014L, 2011L, 
    2014L, 2011L, 2014L, 2011L, 2014L, 2011L), VCTC = c(0, 0.16887878, 
    0.87297022, 0.85148515, 0.24422236, 0.24970615, 0.33174224, 0.28012834, 
    0.31638913, 0.32945877, 0.22575998, 0.21708543), ER = c(0.85591, 
    1.512362, 0.6225519, 0.6321466, 0.4315355, 0.4156023, 0.3927662, 
    0.4291702, 0.3525573, 0.3633482, 0.345002, 0.3596391), TA = c(235193.8, 
    301355, 809343.3, 777083.1, 2573915.1, 1853465, 2457455.2, 1877624.1, 
    2697975.7, 1949372.7, 3320881.7, 2456488.5), GDP = c(534678.1, 
    522645.5, 1801480.1, 1789140.7, 10438529.2, 7551500.4, 10438529.2, 
    7551500.4, 10438529.2, 7551500.4, 10438529.2, 7551500.4)), class = "data.frame", row.names = c(NA, 
    -12L))
    

    You start from here, run the regression with the interaction term:

    df$YEARS = factor(df$YEARS)
    pooled2011<-plm(VCTC ~ ER + log(TA) + log(GDP),model = "pooling",data=pdata.frame(subset(df,YEARS==2011),index=c("BANKS","YEARS")))
    
    pooled2014<-plm(VCTC ~ ER + log(TA) + log(GDP),model = "pooling",data=pdata.frame(subset(df,YEARS==2014),index=c("BANKS","YEARS")))
    

    In the regression based on 6 data points each. Run that for your whole dataset.

    b1 <- summary(pooled2011)$coefficients["ER",1]
    se1 <- summary(pooled2011)$coefficients["ER",2]
    b2 <- summary(pooled2014)$coefficients["ER",1]
    se2 <- summary(pooled2014)$coefficients["ER",2]
    

    The difference is simply the difference between the two regression coefficients (assuming all variables are on the same scale), and you can report the standard error :

    compare.coeff <- function(b1,se1,b2,se2){
    delta = b1-b2
    se = sqrt(se1^2+se2^2)
    Zscore = (delta)/se
    p_value = 2*pnorm(-abs(Zscore))
    c(delta=delta,se=se,Zscore=Zscore,p_value=p_value)
    }
    
    compare.coeff(b1,se1,b2,se2)
         delta         se     Zscore    p_value 
    -1.7461849  7.4390338 -0.2347328  0.8144162 
    

    You can also check out books or chapters like this, basically anything that describes the use of the interaction term. I also answered a similar question before, you can also check that.