Search code examples
rplmnon-linear-regression

Get fitted values of a log dependent variable to the original panel data set in R


I am trying to combine the fitted value of the dependent variable of log-log model. My data set is a unbalanced panel. I tried to do it in the way as indicated here. But my question is different because I have already converted my big data set into a plm object and it's a log dependent variable.

My simple data set can be accessed via the following codes.

dat = structure(list(Time = structure(c(9L, 7L, 15L, 1L, 17L, 13L, 
11L, 3L, 23L, 21L, 19L, 5L, 10L, 8L, 16L, 2L, 18L, 14L, 12L, 
4L, 24L, 22L, 20L, 6L), .Label = c("Apr-00", "Apr-01", "Aug-00", 
"Aug-01", "Dec-00", "Dec-01", "Feb-00", "Feb-01", "Jan-00", "Jan-01", 
"Jul-00", "Jul-01", "Jun-00", "Jun-01", "Mar-00", "Mar-01", "May-00", 
"May-01", "Nov-00", "Nov-01", "Oct-00", "Oct-01", "Sep-00", "Sep-01"
), class = "factor"), Firm = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), Out = c(161521L, 
142452L, 365697L, 355789L, 376843L, 258762L, 255447L, 188545L, 
213663L, 273209L, 317468L, 238668L, 241286L, 135288L, 363609L, 
318472L, 446279L, 390230L, 118945L, 174887L, 183770L, 197832L, 
317468L, 238668L), Lab = c(261L, 334L, 156L, 134L, 159L, 119L, 
41L, 247L, 251L, 62L, 525L, 217L, 298L, 109L, 7L, NA, 0L, 50L, 
143L, 85L, 80L, 214L, 525L, 217L), Cap = c(13L, 15L, 14L, 12L, 
15L, 12L, 45L, 75L, NA, 12L, 15L, 16L, 42L, 45L, 24L, 56L, 12L, 
12L, 45L, NA, 15L, 12L, 15L, 16L)), .Names = c("Time", "Firm", 
"Out", "Lab", "Cap"), class = "data.frame", row.names = c(NA, 
-24L))

My data set looks like below and with missing data of the predictors.

    +--------+------+--------+-----+-----+
|  Time  | Firm |  Out   | Lab | Cap |
+--------+------+--------+-----+-----+
| Jan-00 | A    | 161521 | 261 |  13 |
| Feb-00 | A    | 142452 | 334 |  15 |
| Mar-00 | A    | 365697 | 156 |  14 |
| Apr-00 | A    | 355789 | 134 |  12 |
| May-00 | A    | 376843 | 159 |  15 |
| Jun-00 | A    | 258762 | 119 |  12 |
| Jul-00 | A    | 255447 |  41 |  45 |
| Aug-00 | A    | 188545 | 247 |  75 |
| Sep-00 | A    | 213663 | 251 |     |
| Oct-00 | A    | 273209 |  62 |  12 |
| Nov-00 | A    | 317468 | 525 |  15 |
| Dec-00 | A    | 238668 | 217 |  16 |
| Jan-01 | B    | 241286 | 298 |  42 |
| Feb-01 | B    | 135288 | 109 |  45 |
| Mar-01 | B    | 363609 |   7 |  24 |
| Apr-01 | B    | 318472 |     |  56 |
| May-01 | B    | 446279 |   0 |  12 |
| Jun-01 | B    | 390230 |  50 |  12 |
| Jul-01 | B    | 118945 | 143 |  45 |
| Aug-01 | B    | 174887 |  85 |     |
| Sep-01 | B    | 183770 |  80 |  15 |
| Oct-01 | B    | 197832 | 214 |  12 |
| Nov-01 | B    | 317468 | 525 |  15 |
| Dec-01 | B    | 238668 | 217 |  16 |
+--------+------+--------+-----+-----+ 

I can get the fitted values using the following codes

library(zoo)
library(plm)

Sys.setlocale("LC_TIME", "English")
dat["time1"] <- as.yearmon(dat$Time,format="%b-%y")
pdat <-pdata.frame(dat,index=c("Firm","time1"))
Model1<- plm(log(Out) ~ lag(log(Cap), 1) + log(Lab + 1),
         model = "within", data=pdat)
summary(Model1)

library(data.table)
FV_Log <- data.table(Model1$model[[1]] - Model1$residuals)

But the observations of the pdat is 24 observations FV_Log is 19 observations, so I am unable to merge it to pdat. My pdat is large with several thousands of observations and I have created many variables using codes. So any help to merge the fitted value into the original pdat with properly (without changing the order) would be much appreciated.


Solution

  • The problem you're facing is twofold. First, you have NA's that your model is dropping when calculating the residuals so naturally your result will have fewer elements. The second problem is that you're lagging one of your covariates, which produces an NA for your first observation (first temporally) -after all that's what a lag does. To overcome that problem you need additional data from the previous time period or you have to forgo that observation. Assuming you do not have access to data from the previous time period, this is how I would go about resolving the issue.

     #First I would create a new variable for CAP and just lag and log that separately, rather than applying the function in the formula of the model itself
     pdat$Cap.lag.ln<-lag(log(pdat$Cap), 1)
     pdat$Cap<-NULL #deleting the old variable to clear up the mess
    
     #Dont necessarily need the na.omit but it couldn't hurt...
     Model1<- plm(log(Out) ~ Cap.lag.ln + log(Lab + 1),
        model = "within", data=pdat, na.omit=TRUE)
     FV_Log <- data.table(Model1$model[[1]] - Model1$residuals)
    
     #Now this is where you reduce your original dataset (pdat) by getting rid of the NAs
     pdat2<-na.omit(pdat)
     #You will notice that they're the same dimensions now and you can cbind
     pdat3 <-cbind(pdat2,FV_Log)
          Time Firm    Out Lab    time1 Cap.lag.ln       V1
     1: Feb-00    A 142452 334 Feb 2000          2 12.41211
     2: Mar-00    A 365697 156 Mar 2000          2 12.54861
     3: Apr-00    A 355789 134 Apr 2000          2 12.57580
     4: May-00    A 376843 159 May 2000          2 12.54520
     5: Jun-00    A 258762 119 Jun 2000          2 12.59702
     6: Jul-00    A 255447  41 Jul 2000          2 12.78611
     7: Aug-00    A 188545 247 Aug 2000          3 12.28887
     8: Sep-00    A 213663 251 Sep 2000          4 12.10858
     9: Nov-00    A 317468 525 Nov 2000          2 12.33084
     10: Dec-00   A 238668 217 Dec 2000          2 12.48949
     11: Feb-01   B 135288 109 Feb 2001          3 12.20776
     12: Mar-01   B 363609   7 Mar 2001          3 12.67984
     13: May-01   B 446279   0 May 2001          4 12.87698
     14: Jun-01   B 390230  50 Jun 2001          2 12.52360
     15: Jul-01   B 118945 143 Jul 2001          2 12.33665
     16: Aug-01   B 174887  85 Aug 2001          3 12.25209
     17: Oct-01   B 197832 214 Oct 2001          2 12.26445
     18: Nov-01   B 317468 525 Nov 2001          2 12.10331
     19: Dec-01   B 238668 217 Dec 2001          2 12.26195
    

    And if you want to retrieve those NA's back, you can do the following:

     pdat3 <-as.data.frame(pdat3)
     pdat4<-merge(pdat3, pdat, 
        by=c("Time","Firm","Out", "Lab","time1"), 
        all.x=TRUE,all.y=TRUE)
    
       Time Firm    Out Lab    time1 Cap.lag.ln       V1 Cap
     1  Apr-00    A 355789 134 Apr 2000          2 12.57580  12
     2  Apr-01    B 318472  NA Apr 2001         NA       NA  56
     3  Aug-00    A 188545 247 Aug 2000          3 12.28887  75
     4  Aug-01    B 174887  85 Aug 2001          3 12.25209  NA
     5  Dec-00    A 238668 217 Dec 2000          2 12.48949  16
     6  Dec-01    B 238668 217 Dec 2001          2 12.26195  16
     7  Feb-00    A 142452 334 Feb 2000          2 12.41211  15
     8  Feb-01    B 135288 109 Feb 2001          3 12.20776  45
     9  Jan-00    A 161521 261 Jan 2000         NA       NA  13
     10 Jan-01    B 241286 298 Jan 2001         NA       NA  42
     11 Jul-00    A 255447  41 Jul 2000          2 12.78611  45
     12 Jul-01    B 118945 143 Jul 2001          2 12.33665  45
     13 Jun-00    A 258762 119 Jun 2000          2 12.59702  12
     14 Jun-01    B 390230  50 Jun 2001          2 12.52360  12
     15 Mar-00    A 365697 156 Mar 2000          2 12.54861  14
     16 Mar-01    B 363609   7 Mar 2001          3 12.67984  24
     17 May-00    A 376843 159 May 2000          2 12.54520  15
     18 May-01    B 446279   0 May 2001          4 12.87698  12
     19 Nov-00    A 317468 525 Nov 2000          2 12.33084  15
     20 Nov-01    B 317468 525 Nov 2001          2 12.10331  15
     21 Oct-00    A 273209  62 Oct 2000         NA       NA  12
     22 Oct-01    B 197832 214 Oct 2001          2 12.26445  12
     23 Sep-00    A 213663 251 Sep 2000          4 12.10858  NA
     24 Sep-01    B 183770  80 Sep 2001         NA       NA  15