Search code examples

Panel Data Forecasting With R

I have data that is organized in panels like this (see below for output from the dput() function):

Country Year Month Var1 Var2
C1      2000 1     0    0
C1      2000 2     1    0
C1      2000 3     2    1
C2      2000 1     1    1
C2      2000 2     1    2
C2      2000 3     3    1

The data set has in total 27 countries for the years 1999 to 2008, but with unbalanced panels.

I want to be able to estimate a model for the full data set, and from this model do forecasting for each country in the data set. I have been looking into the YourCast package from King et al. but since I have all my data in a single file, I am at a loss as to how to create a data object that the yourcast() function will accept. Does anyone know how to do this without going through the tedious procedure of manually splitting the data file up into the different cross sections?

PS: 48 observations from the data set:

structure(list(Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Belgium", 
"Denmark", "Czech.Republic", "Germany", "Estonia", "Greece", 
"Spain", "France", "Ireland", "Italy", "Cyprus", "Latvia", "Lithuania", 
"Luxembourg", "Hungary", "Malta", "Netherlands", "Austria", "Poland", 
"Portugal", "Slovenia", "Slovakia", "Bulgaria", "Romania", "Finland", 
"Sweden", "UK"), class = "factor"), Year = c(2003, 2003, 2003, 
2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2004, 2004, 
2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2003, 
2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 
2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 
2004, 2005), Month = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1), Yes = c(21L, 
18L, 20L, 19L, 31L, 39L, 28L, 2L, 28L, 21L, 26L, 50L, 14L, 28L, 
50L, 83L, 10L, 25L, 22L, 6L, 22L, 39L, 32L, 56L, 22L, 17L, 20L, 
20L, 32L, 39L, 23L, 2L, 27L, 21L, 28L, 48L, 14L, 27L, 50L, 89L, 
10L, 25L, 22L, 4L, 22L, 38L, 31L, 56L, 16L), No = c(0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 
0L, 0L, 0L, 1L, 2L, 0L, 0L, 0L, 2L, 0L, 1L, 1L, 0L, 0L), Abstention = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 3L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), No.Neg = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L), Abstention.Neg = c(0L, 0L, 0L, 1L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Yes.Neg = c(1L, 
0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 1L, 
0L, 0L, 2L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L
), Yes.Pos = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L), Missing = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Enlargement = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1)), .Names = c("Country", "Year", "Month", "Yes", 
"No", "Abstention", "No.Neg", "Abstention.Neg", "Yes.Neg", "Yes.Pos", 
"Missing", "Enlargement"), row.names = c(1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 
19L, 20L, 21L, 22L, 23L, 24L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 
68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 
81L, 82L, 83L, 84L, 85L), class = "data.frame")


  • If I understand your problem, splitting up the database could be quite easy. Supposing you named the dataset 'data':

    results <- list()
    for (i in 1:nlevels(data$Country)) {
        results[[levels(data$Country)[i]]] <- yourcast(...)

    In which simple loop you could do all forecasting to each country, and save the results to a list. Later you can read all results from the results list for all countries. E.g.: results[['Hungary']]

    As I do not know anything about the package you use, here is a small example that could be fitted in the loop instead of the line containing yourcast() function:

    results[[levels(data$Country)[i]]] <- c(levels(data$Country)[i], length(which(data$Country == levels(data$Country)[i])))

    Which command will create a list containg all countries with two variables: name and sample size of given country.