Search code examples
rdataframeregressionpanel

Panel Dataset and rolling window regression in R


I have a massive static panel dataset, but let me simplify things with the following example

set.seed(123)

x=c(NA,NA,NA,5.1,5.2,5.3,5.15,4.9,5,5.01)
df=data.frame(country = c(replicate(10,"Germany"),replicate(10, "USA"),replicate(10,"Zimbabwe ")),
          year=c(replicate(3,2000:2009)),log_gdp=c(sample(runif(10,9.8,10.7)),sample(runif(10,11,12)),x))

I have to conduct a rolling window regression per each country over the period T to T+N, by regressing log_gdp over moving windows of 4 years. For instance, take Germany: I should regress the following vector log_gdp_{2000-2003}=[10.21,10.65,9.84,10.30] over T_i=[1,2,3,4] and I should store the related regression (slope) coefficient. Then, R should run the regression starting from the log_gdp value of Germany in 2001 up to 2004, and so it regresses log_gdp_{2001-2004}=[10.65,9.84,10.30,10.06] over T_i=[1,2,3,4] and should store the related slope regression coefficient. The procedure should be done up to 2006-2009 for each country.

Finally, all the beta coefficients should be stored in a new column of the dataset.

Attempt:

library(zoo)
rollapply(zoo(df),
      width=4,
      FUN = function(Z) 
      { 
        t = lm(formula=log_gdp~year, data = as.data.frame(Z), na.rm=T); 
        return(t$coef) 
      },
      by.column=FALSE, align="right") 

Solution

  • The sample code in the question suggests right alignment is wanted but the comments suggest left alignment so using the latter we have the following. Note that the slope equals cov(x, y) / var(x) where x is 1:4 (because slope is shift invariant so the slope against 1:4 is the same as the slope against 2000:2003, say) after both have positions where y is NA removed.

    library(dplyr)
    library(zoo)
    
    slope4 <- function(x) {
      ok <- !is.na(x)
      ix <- seq(4)[ok]
      cov(x[ok], ix) / var(ix)
    }
    
    df %>%
      mutate(roll = rollapply(log_gdp, 4, slope4, fill = NA, align = "left"), 
       .by = country)
    

    giving

         country year   log_gdp         roll
    1    Germany 2000 10.594716 -0.003790421
    2    Germany 2001  9.841001  0.143188999
    3    Germany 2002 10.058820  0.142140697
    4    Germany 2003 10.509475 -0.016120800
    5    Germany 2004 10.168079 -0.022150695
    6    Germany 2005 10.646421 -0.021506850
    7    Germany 2006 10.296292  0.032923416
    8    Germany 2007 10.210953           NA
    9    Germany 2008 10.603177           NA
    10   Germany 2009 10.275295           NA
    11       USA 2000 11.655706 -0.015998540
    12       USA 2001 11.708530  0.062767052
    13       USA 2002 11.594142  0.011156424
    14       USA 2003 11.640507  0.060931924
    15       USA 2004 11.902299 -0.184659799
    16       USA 2005 11.544066  0.053470067
    17       USA 2006 11.963024 -0.117443736
    18       USA 2007 11.147114           NA
    19       USA 2008 11.994270           NA
    20       USA 2009 11.289160           NA
    21 Zimbabwe  2000        NA           NA
    22 Zimbabwe  2001        NA  0.100000000
    23 Zimbabwe  2002        NA  0.100000000
    24 Zimbabwe  2003  5.100000  0.025000000
    25 Zimbabwe  2004  5.200000 -0.105000000
    26 Zimbabwe  2005  5.300000 -0.115000000
    27 Zimbabwe  2006  5.150000 -0.032000000
    28 Zimbabwe  2007  4.900000           NA
    29 Zimbabwe  2008  5.000000           NA
    30 Zimbabwe  2009  5.010000           NA