Search code examples
rwindowregressionrolling-computationpanel-data

Rolling Window Regression by group in R (with dates)


THIS IS MY DATA

I have a panel data in R, so I want to create a rolling window linear regression by group. For instance, I have a lot of dates from 1 to 618. Each number represents one date, but I have more than one observation for each date.

I want to create a rolling window for 20 dates. Finally, i want to output all coefficients for lm(y~x1+x2+x3+x4+x5+x6) in the period 1:20, and make a rolling window for doing another regression for 2:21, 3:22.. and so on for all my observations, so the last coefficients are for 598:618 period (I have 618 so i can´t do it manually).

My problem is that i select a window for 20 observations but i only get to select this 20 first observations, for example: 1 1 1 1 1 1 1 .... 1 and maybe the first 20 observations are only observations for the first date (1), because there are more than one observations by date. So I want to catch 20 observationes filtering by group, actually this will be more than 20 observations, but i want to rolling by date (date 1 to date 20, regardless of the observations.

After that, i need to estimate by Newey West method, so i need include in the final code something like that and output all coefficients and t-statistics.

neweywest <- coeftest(LMOBJECT, vcov. = NeweyWest, lag=12)

I hope it has been understood well.


Solution

  • You can create multiple linear models for a given interval of dates like this:

    library(tidyverse)
    
    # example data
    set.seed(1337)
    n_dates <- 10
    data <- tibble(
      date = runif(100, min = 1, max = n_dates) %>% floor(),
      x1 = runif(100)**2,
      x2 = runif(100) * 2,
      x3 = runif(100) + 2,
      y = x1 + 2 * x2 + runif(100)
    ) %>%
      arrange(date)
    data
    #> # A tibble: 100 × 5
    #>     date         x1    x2    x3     y
    #>    <dbl>      <dbl> <dbl> <dbl> <dbl>
    #>  1     1 0.754      0.700  2.21 2.79 
    #>  2     1 0.0230     1.97   2.70 4.89 
    #>  3     1 0.388      0.500  2.21 1.54 
    #>  4     1 0.225      0.135  2.87 0.849
    #>  5     1 0.00000810 0.139  2.22 1.12 
    #>  6     1 0.255      0.893  2.21 2.25 
    #>  7     1 0.402      1.37   2.06 3.51 
    #>  8     1 0.00275    0.363  2.68 0.984
    #>  9     2 0.238      1.68   2.53 3.98 
    #> 10     2 0.0309     1.47   2.05 3.69 
    #> # … with 90 more rows
    
    # number of rows per day
    data %>% count(date)
    #> # A tibble: 9 × 2
    #>    date     n
    #>   <dbl> <int>
    #> 1     1     8
    #> 2     2    10
    #> 3     3    15
    #> 4     4     8
    #> 5     5    10
    #> 6     6    10
    #> 7     7    12
    #> 8     8     7
    #> 9     9    20
    
    # size of rolling window in days
    window_size <- 3
    
    models <- tibble(
      from = seq(n_dates),
      to = from + window_size - 1
    ) %>%
      mutate(
        data = from %>% map2(to, ~ data %>% filter(date >= .x & date <= .y)),
        model = data %>% map(possibly(~ lm(y ~ x1 + x2 + x3, data = .x), NA))
      )
    models
    #> # A tibble: 10 × 4
    #>     from    to data              model    
    #>    <int> <dbl> <list>            <list>   
    #>  1     1     3 <tibble [33 × 5]> <lm>     
    #>  2     2     4 <tibble [33 × 5]> <lm>     
    #>  3     3     5 <tibble [33 × 5]> <lm>     
    #>  4     4     6 <tibble [28 × 5]> <lm>     
    #>  5     5     7 <tibble [32 × 5]> <lm>     
    #>  6     6     8 <tibble [29 × 5]> <lm>     
    #>  7     7     9 <tibble [39 × 5]> <lm>     
    #>  8     8    10 <tibble [27 × 5]> <lm>     
    #>  9     9    11 <tibble [20 × 5]> <lm>     
    #> 10    10    12 <tibble [0 × 5]>  <lgl [1]>
    
    models %>%
      filter(!is.na(model)) %>%
      transmute(
        from, to,
        coeff = model %>% map(coefficients),
        r2 = model %>% map_dbl(~ .x %>% summary() %>% pluck("r.squared"))
      ) %>%
      unnest_wider(coeff)
    
    # A tibble: 9 x 7
    #   from    to `(Intercept)`    x1    x2      x3    r2
    #  <int> <dbl>         <dbl> <dbl> <dbl>   <dbl> <dbl>
    #1     1     3         0.601 0.883  2.07 -0.0788 0.970
    #2     2     4         0.766 0.965  2.01 -0.141  0.965
    #3     3     5         0.879 0.954  1.94 -0.165  0.953
    

    Another way of subseting groups is to use nest:

    # get all observations from day 3 to 5
    data %>% arrange(date) %>% nest(-date) %>% slice(3:5) %>% unnest()