Search code examples
rcbindplm

How to combine a column from a data table to panel data structure


I have been working with a large panel data set. My simple data set look like below.


+--------+------+------+-----+-----+--------+
|  Time  | Firm | Land | Lab | Cap |   GL   |
+--------+------+------+-----+-----+--------+
| Jan-00 | A    |   25 | 261 |  13 | 161521 |
| Feb-00 | A    |   25 | 334 |  15 | 142452 |
| Mar-00 | A    |   25 | 156 |  14 | 365697 |
| Apr-00 | A    |   28 | 134 |  12 | 355789 |
| May-00 | A    |   28 | 159 |  15 | 376843 |
| Jun-00 | A    |   28 | 119 |  12 | 258762 |
| Jul-00 | A    |   28 |  41 |  45 | 255447 |
| Aug-00 | A    |   28 | 247 |  75 | 188545 |
| Sep-00 | A    |   28 | 251 |  41 | 213663 |
| Oct-00 | A    |   30 |  62 |  12 | 273209 |
| Nov-00 | A    |   30 | 525 |  15 | 317468 |
| Dec-00 | A    |   30 | 217 |  16 | 238668 |
| Jan-01 | B    |   42 | 298 |  42 | 241286 |
| Feb-01 | B    |   42 | 109 |  45 | 135288 |
| Mar-01 | B    |   42 |   7 |  24 | 363609 |
| Apr-01 | B    |   42 |  12 |  56 | 318472 |
| May-01 | B    |   42 |   0 |  12 | 446279 |
| Jun-01 | B    |   45 |  50 |  12 | 390230 |
| Jul-01 | B    |   45 | 143 |  45 | 118945 |
| Aug-01 | B    |   45 |  85 |  25 | 174887 |
| Sep-01 | B    |   45 |  80 |  15 | 183770 |
| Oct-01 | B    |   45 | 214 |  12 | 197832 |
| Nov-01 | B    |   45 | 525 |  15 | 317468 |
| Dec-01 | B    |   45 | 217 |  16 | 238668 |
+--------+------+------+-----+-----+--------+

The above dataframe can be accessed using the following codes.

structure(list(Time = structure(c(9L, 7L, 15L, 1L, 17L, 13L, 
11L, 3L, 23L, 21L, 19L, 5L, 10L, 8L, 16L, 2L, 18L, 14L, 12L, 
4L, 24L, 22L, 20L, 6L), .Label = c("Apr-00", "Apr-01", "Aug-00", 
"Aug-01", "Dec-00", "Dec-01", "Feb-00", "Feb-01", "Jan-00", "Jan-01", 
"Jul-00", "Jul-01", "Jun-00", "Jun-01", "Mar-00", "Mar-01", "May-00", 
"May-01", "Nov-00", "Nov-01", "Oct-00", "Oct-01", "Sep-00", "Sep-01"
), class = "factor"), Firm = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), Land = c(25L, 
25L, 25L, 28L, 28L, 28L, 28L, 28L, 28L, 30L, 30L, 30L, 42L, 42L, 
42L, 42L, 42L, 45L, 45L, 45L, 45L, 45L, 45L, 45L), Lab = c(261L, 
334L, 156L, 134L, 159L, 119L, 41L, 247L, 251L, 62L, 525L, 217L, 
298L, 109L, 7L, 12L, 0L, 50L, 143L, 85L, 80L, 214L, 525L, 217L
), Cap = c(13L, 15L, 14L, 12L, 15L, 12L, 45L, 75L, 41L, 12L, 
15L, 16L, 42L, 45L, 24L, 56L, 12L, 12L, 45L, 25L, 15L, 12L, 15L, 
16L), GL = c(161521L, 142452L, 365697L, 355789L, 376843L, 258762L, 
255447L, 188545L, 213663L, 273209L, 317468L, 238668L, 241286L, 
135288L, 363609L, 318472L, 446279L, 390230L, 118945L, 174887L, 
183770L, 197832L, 317468L, 238668L)), .Names = c("Time", "Firm", 
"Land", "Lab", "Cap", "GL"), class = "data.frame", row.names = c(NA, 
-24L))

I converted the data into a panel structure as below.

library(zoo)
library(plm)

Sys.setlocale("LC_TIME", "English")
dat["time1"] <- as.yearmon(dat$Time,format="%b-%y")
pdat <-pdata.frame(dat,index=c("Firm","time1"))

Original Analysis

pdat$Cap.lag.ln<-lag(log(pdat$Cap), 1)
pdat$Cap.2lag.ln<-lag(log(pdat$Cap), 2)
pdat$Lab.ln<-log(pdat$Lab+1)
pdat$Lab.lag.ln<-lag(log(pdat$Lab+1), 1)

Model1<- plm(log(Land) ~ Cap.lag.ln + Cap.2lag.ln + Lab.ln + Lab.lag.ln,
         model = "within", data=pdat)
summary(Model1)

FV_Log <- data.table(Model1$model[[1]] - Model1$residuals)

Just to omit the NAs in pdat resulted from lags to combine fitted values

pdat2<-na.omit(pdat)

The problem

I want to combine these fitted values to the pdat2 but need to retain the panel structure as it is. But if I go for the following cbind then the panel structure is disappeared.

pdat3 <-cbind(pdat2,FV_Log)

Even I tried omitting old variables and the following way.

pdat$Cap<-NULL
pdat$Lab<-NULL
pdat2 <- na.omit(pdat)
pdat2$FV_Log <- (Model1$model[[1]] - Model1$residuals)

and run my model2 using data=pdat2 but the following error message comes up.
Model2<-plm(log(GL)~ Cap.lag.ln + Cap.2lag.ln + Lab.ln + Lab.lag.ln
+ FV_Log, data=pdat2, model = "within")

**Error in model.matrix.pFormula(formula, data, rhs = 1, model = model,  : 
NA in the individual index variable
In addition: Warning message:
In `[.data.frame`(index, as.numeric(rownames(mf)), ) :
NAs introduced by coercion**

Can anybody please assist me to identify this problem and to run the model 2 in plm structure. Thank you


Solution

  • Updating my answer after OP provided code needed for replication.

    I think using data.table to create FV_Log is a complicating factor.

    I ran the following codes immediately after the creation of pdat3 above.

    # convert FV_Log to just data.frame
    FV_Log<-data.frame(FV_Log)
    
    # bind FV_Log to pdat2 using use pdata.frame to preserve structure.
    pdat3_fix<-pdata.frame(cbind(pdat2,FV_Log),index=c("Firm","time1")) 
    

    Here is pdat3_fix, which is identical to pdat2 with the exception of the new column:

    enter image description here

    Note that the resulting data.frame is also an object of class pdata.frame:

    > class(pdat3_fix)
    [1] "pdata.frame" "data.frame"