Search code examples

Compute the difference between two different regressions in R

I am trying to compute the coefficient and the respective p-value for the variation of the variable "Efficiency Ratio" (ER) between 2014 and 2011 (i.e., ∆=2014-2011). I have already computed individual regressions for 2011 and 2014 using the pooled OLS. I don't know the next step to compute the result of this variation (estimated coefficient and p-value). I am trying to conclude whether the sensitivity of the dependent variable concerning the ER variable has been less positively correlated.

Below, I present the individual regressions for 2011 and 2014, and part of my database. I would appreciate any insights on how to do this in R. Thank you.

pdata2011<-pdata.frame(paneldata2011, index = c("BANKS","YEARS"))

pooled2011<-plm(VCTC ~ ER + log(TA) + log(GDP), data = pdata2011,  model = "pooling")

pdata2014<-pdata.frame(paneldata2014, index = c("BANKS","YEARS"))

pooled2014<-plm(VCTC ~ ER + log(TA) + log(GDP), data = pdata2014, model = "pooling")
  BANKS YEARS    VCTC         ER           TA         GDP
    1   2014    0.00000000  0.8559100   235193.8    534678.1
    1   2011    0.16887878  1.5123620   301355.0    522645.5
    2   2014    0.87297022  0.6225519   809343.3    1801480.1
    2   2011    0.85148515  0.6321466   777083.1    1789140.7
    3   2014    0.24422236  0.4315355   2573915.1   10438529.2
    3   2011    0.24970615  0.4156023   1853465.0   7551500.4
    4   2014    0.33174224  0.3927662   2457455.2   10438529.2
    4   2011    0.28012834  0.4291702   1877624.1   7551500.4
    5   2014    0.31638913  0.3525573   2697975.7   10438529.2
    5   2011    0.32945877  0.3633482   1949372.7   7551500.4
    6   2014    0.22575998  0.3450020   3320881.7   10438529.2
    6   2011    0.21708543  0.3596391   2456488.5   7551500.4
    34  2014    0.94692763  0.7477073   274119.0    17521746.5
    34  2011    0.93822571  0.7259823   216827.0    15542581.1
    35  2014    0.86932004  0.5752208   1687155.0   17521746.5
    35  2011    0.85889245  0.6049802   1313867.0   15542581.1


  • You can do as @LynnL proposed, which is to include an interaction term. If there are significant differences (i.e not zero) between the effects of ER, this term will give you a small p-value.

    Ideally provide the data next time, because we have no idea what are YEARS etc and whether the two data.frames can be combined. Below I suggest using a zscore to look at the difference between 2 coefficients:


    Using the first 12 rows of your data, assuming this is the combined dataset:

        df = structure(list(BANKS = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 
    5L, 6L, 6L), YEARS = c(2014L, 2011L, 2014L, 2011L, 2014L, 2011L, 
    2014L, 2011L, 2014L, 2011L, 2014L, 2011L), VCTC = c(0, 0.16887878, 
    0.87297022, 0.85148515, 0.24422236, 0.24970615, 0.33174224, 0.28012834, 
    0.31638913, 0.32945877, 0.22575998, 0.21708543), ER = c(0.85591, 
    1.512362, 0.6225519, 0.6321466, 0.4315355, 0.4156023, 0.3927662, 
    0.4291702, 0.3525573, 0.3633482, 0.345002, 0.3596391), TA = c(235193.8, 
    301355, 809343.3, 777083.1, 2573915.1, 1853465, 2457455.2, 1877624.1, 
    2697975.7, 1949372.7, 3320881.7, 2456488.5), GDP = c(534678.1, 
    522645.5, 1801480.1, 1789140.7, 10438529.2, 7551500.4, 10438529.2, 
    7551500.4, 10438529.2, 7551500.4, 10438529.2, 7551500.4)), class = "data.frame", row.names = c(NA, 

    You start from here, run the regression with the interaction term:

    df$YEARS = factor(df$YEARS)
    pooled2011<-plm(VCTC ~ ER + log(TA) + log(GDP),model = "pooling",data=pdata.frame(subset(df,YEARS==2011),index=c("BANKS","YEARS")))
    pooled2014<-plm(VCTC ~ ER + log(TA) + log(GDP),model = "pooling",data=pdata.frame(subset(df,YEARS==2014),index=c("BANKS","YEARS")))

    In the regression based on 6 data points each. Run that for your whole dataset.

    b1 <- summary(pooled2011)$coefficients["ER",1]
    se1 <- summary(pooled2011)$coefficients["ER",2]
    b2 <- summary(pooled2014)$coefficients["ER",1]
    se2 <- summary(pooled2014)$coefficients["ER",2]

    The difference is simply the difference between the two regression coefficients (assuming all variables are on the same scale), and you can report the standard error :

    compare.coeff <- function(b1,se1,b2,se2){
    delta = b1-b2
    se = sqrt(se1^2+se2^2)
    Zscore = (delta)/se
    p_value = 2*pnorm(-abs(Zscore))
         delta         se     Zscore    p_value 
    -1.7461849  7.4390338 -0.2347328  0.8144162 

    You can also check out books or chapters like this, basically anything that describes the use of the interaction term. I also answered a similar question before, you can also check that.