Search code examples
rregressionpanel-data

panelAR doesn't recognise my coulmn as an integer


https://easyupload.io/3rnesm

So I'm working with the following data set:

# A tibble: 1,136 x 17
   ccode  year vanhdemo pcgnp  left ainew sdnew milctr2 britinfl  lpop  iwar  cwar popinc pcginc polrtnew lag_ainew lag_sdnew
   <dbl> <int>    <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>    <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>    <dbl>     <dbl>     <dbl>
 1     2  1980     18.7  11.3     0     1     1       0        1  19.2     0     0   1.01   7.51        7        NA        NA
 2     2  1981     18.7  12.3     0     1     1       0        1  19.3     0     0   1.01   7.99        7         1         1
 3     2  1982     18.7  13.2     0     1     1       0        1  19.3     0     0   1.01   7.39        7         1         1
 4     2  1983     18.7  14.2     0     1     1       0        1  19.3     0     0   1.01   7.69        7         1         1
 5     2  1984     16.1  15.5     0     1     1       0        1  19.3     0     0   1.01   9.66        7         1         1
 6     2  1985     16.1  16.5     0     1     1       0        1  19.3     0     0   1.01   6.24        7         1         1
 7     2  1986     16.1  17.5     0     1     1       0        1  19.3     0     0   1.01   5.86        7         1         1
 8     2  1987     16.1  18.6     0     2     2       0        1  19.3     0     0   1.01   6.39        7         1         1
 9    20  1980     25.6  10.2     0     1     1       0        1  17.0     0     0   1.08   9.01        7        NA        NA
10    20  1981     25.6  10.7     0     1     1       0        1  17.0     0     0   1.08   5.77        7         1         1
# ... with 1,126 more rows

As you can see, R recognises the variable year as an integer. Originally, the column values were numerical, but I converted them to integer. However, when I run the following code (using the panelAR package), I run into trouble:

panelAR(vanhdemo ~ pcgnp + left + lpop + iwar + milctr2 + britinfl, data = dat, 
        panelVar = "ccode", timeVar = "year", autoCorr = "psar1", panelCorrMethod = "pcse",
        rho.na.rm = TRUE, panel.weight = "t-1", bound.rho = TRUE)

I get this error message:

Error: The time variable must be defined as an integer.

I can't understand what I'm doing wrong here. If I recreate a part of the data set (as below), the model runs fine. So is the problem then rooted in the data set (originally a .dta file)? I can definitely upload it if anyone is interested in looking at it.

Here's a small example of the same data:

ccode  <- c(rep(2,8), rep(20, 2))
year  <- c(1980:1987, 1980, 1981)
vanhdemo <- c(rep(18.7, 4), rep(16.1, 4), rep(25.6, 2))
pcgnp  <- c(11.3, 12.3 , 13.2, 14.2, 15.5, 16.5, 17.5, 18.6, 10.2, 10.7)

dat <- data.frame(ccode, year, vanhdemo, pcgnp)

Solution

  • The error has to do with the different behavior of data.frames and tibbles. Tibbles are a special kind of data.frame that 'prevent dimension dropping'. If you try to subset a tibble with a single column name, you will get back a single-column tibble. Yet depending on how you subset a data.frame, you may get back a vector or a data.frame. The error you see results from this distinction. Internally panelAR() subsets the data object for the time variable:

    time.vec <- data[, timeVar]
    

    So, if 'data' is a tibble, 'time.vec' will be a single-column tibble, whereas if 'data' is a data.frame, 'time.vec' will be a vector. panelAR() then checks if 'time.vec' is an integer vector, and throws the error in cases where the data object is a tibble.

    You can see this behavior in your sample data as follows:

    # make a tibble 
    dat_tib <- tibble::as_tibble(dat)
    
    # returns a vector
    dat[, "year"]
    
    # returns a data.frame
    dat["year"]
    
    # returns a tibble
    dat_tib[, "year"]
    
    # returns a tibble
    dat_tib["year"]