So I'm working with the following data set:
# A tibble: 1,136 x 17
ccode year vanhdemo pcgnp left ainew sdnew milctr2 britinfl lpop iwar cwar popinc pcginc polrtnew lag_ainew lag_sdnew
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1980 18.7 11.3 0 1 1 0 1 19.2 0 0 1.01 7.51 7 NA NA
2 2 1981 18.7 12.3 0 1 1 0 1 19.3 0 0 1.01 7.99 7 1 1
3 2 1982 18.7 13.2 0 1 1 0 1 19.3 0 0 1.01 7.39 7 1 1
4 2 1983 18.7 14.2 0 1 1 0 1 19.3 0 0 1.01 7.69 7 1 1
5 2 1984 16.1 15.5 0 1 1 0 1 19.3 0 0 1.01 9.66 7 1 1
6 2 1985 16.1 16.5 0 1 1 0 1 19.3 0 0 1.01 6.24 7 1 1
7 2 1986 16.1 17.5 0 1 1 0 1 19.3 0 0 1.01 5.86 7 1 1
8 2 1987 16.1 18.6 0 2 2 0 1 19.3 0 0 1.01 6.39 7 1 1
9 20 1980 25.6 10.2 0 1 1 0 1 17.0 0 0 1.08 9.01 7 NA NA
10 20 1981 25.6 10.7 0 1 1 0 1 17.0 0 0 1.08 5.77 7 1 1
# ... with 1,126 more rows
As you can see, R recognises the variable year
as an integer. Originally, the column values were numerical, but I converted them to integer. However, when I run the following code (using the panelAR package), I run into trouble:
panelAR(vanhdemo ~ pcgnp + left + lpop + iwar + milctr2 + britinfl, data = dat,
panelVar = "ccode", timeVar = "year", autoCorr = "psar1", panelCorrMethod = "pcse",
rho.na.rm = TRUE, panel.weight = "t-1", bound.rho = TRUE)
I get this error message:
Error: The time variable must be defined as an integer.
I can't understand what I'm doing wrong here. If I recreate a part of the data set (as below), the model runs fine. So is the problem then rooted in the data set (originally a .dta file)? I can definitely upload it if anyone is interested in looking at it.
Here's a small example of the same data:
ccode <- c(rep(2,8), rep(20, 2))
year <- c(1980:1987, 1980, 1981)
vanhdemo <- c(rep(18.7, 4), rep(16.1, 4), rep(25.6, 2))
pcgnp <- c(11.3, 12.3 , 13.2, 14.2, 15.5, 16.5, 17.5, 18.6, 10.2, 10.7)
dat <- data.frame(ccode, year, vanhdemo, pcgnp)
The error has to do with the different behavior of data.frames
and tibbles
. Tibbles are a special kind of data.frame that 'prevent dimension dropping'. If you try to subset a tibble with a single column name,
you will get back a single-column tibble. Yet depending on how you subset a data.frame, you may get back a vector or a data.frame. The error you see results from this distinction. Internally panelAR()
subsets the data object for the time variable:
time.vec <- data[, timeVar]
So, if 'data' is a tibble, 'time.vec' will be a single-column tibble, whereas
if 'data' is a data.frame, 'time.vec' will be a vector. panelAR()
then
checks if 'time.vec' is an integer vector, and throws the error in cases where the data object is a tibble.
You can see this behavior in your sample data as follows:
# make a tibble
dat_tib <- tibble::as_tibble(dat)
# returns a vector
dat[, "year"]
# returns a data.frame
dat["year"]
# returns a tibble
dat_tib[, "year"]
# returns a tibble
dat_tib["year"]