r window regression rolling-computation panel-data

Rolling Window Regression by group in R (with dates)

THIS IS MY DATA

I have a panel data in R, so I want to create a rolling window linear regression by group. For instance, I have a lot of dates from 1 to 618. Each number represents one date, but I have more than one observation for each date.

I want to create a rolling window for 20 dates. Finally, i want to output all coefficients for lm(y~x1+x2+x3+x4+x5+x6) in the period 1:20, and make a rolling window for doing another regression for 2:21, 3:22.. and so on for all my observations, so the last coefficients are for 598:618 period (I have 618 so i can´t do it manually).

My problem is that i select a window for 20 observations but i only get to select this 20 first observations, for example: 1 1 1 1 1 1 1 .... 1 and maybe the first 20 observations are only observations for the first date (1), because there are more than one observations by date. So I want to catch 20 observationes filtering by group, actually this will be more than 20 observations, but i want to rolling by date (date 1 to date 20, regardless of the observations.

After that, i need to estimate by Newey West method, so i need include in the final code something like that and output all coefficients and t-statistics.

neweywest <- coeftest(LMOBJECT, vcov. = NeweyWest, lag=12)

I hope it has been understood well.

Solution

You can create multiple linear models for a given interval of dates like this:

library(tidyverse)

# example data
set.seed(1337)
n_dates <- 10
data <- tibble(
  date = runif(100, min = 1, max = n_dates) %>% floor(),
  x1 = runif(100)**2,
  x2 = runif(100) * 2,
  x3 = runif(100) + 2,
  y = x1 + 2 * x2 + runif(100)
) %>%
  arrange(date)
data
#> # A tibble: 100 × 5
#>     date         x1    x2    x3     y
#>    <dbl>      <dbl> <dbl> <dbl> <dbl>
#>  1     1 0.754      0.700  2.21 2.79 
#>  2     1 0.0230     1.97   2.70 4.89 
#>  3     1 0.388      0.500  2.21 1.54 
#>  4     1 0.225      0.135  2.87 0.849
#>  5     1 0.00000810 0.139  2.22 1.12 
#>  6     1 0.255      0.893  2.21 2.25 
#>  7     1 0.402      1.37   2.06 3.51 
#>  8     1 0.00275    0.363  2.68 0.984
#>  9     2 0.238      1.68   2.53 3.98 
#> 10     2 0.0309     1.47   2.05 3.69 
#> # … with 90 more rows

# number of rows per day
data %>% count(date)
#> # A tibble: 9 × 2
#>    date     n
#>   <dbl> <int>
#> 1     1     8
#> 2     2    10
#> 3     3    15
#> 4     4     8
#> 5     5    10
#> 6     6    10
#> 7     7    12
#> 8     8     7
#> 9     9    20

# size of rolling window in days
window_size <- 3

models <- tibble(
  from = seq(n_dates),
  to = from + window_size - 1
) %>%
  mutate(
    data = from %>% map2(to, ~ data %>% filter(date >= .x & date <= .y)),
    model = data %>% map(possibly(~ lm(y ~ x1 + x2 + x3, data = .x), NA))
  )
models
#> # A tibble: 10 × 4
#>     from    to data              model    
#>    <int> <dbl> <list>            <list>   
#>  1     1     3 <tibble [33 × 5]> <lm>     
#>  2     2     4 <tibble [33 × 5]> <lm>     
#>  3     3     5 <tibble [33 × 5]> <lm>     
#>  4     4     6 <tibble [28 × 5]> <lm>     
#>  5     5     7 <tibble [32 × 5]> <lm>     
#>  6     6     8 <tibble [29 × 5]> <lm>     
#>  7     7     9 <tibble [39 × 5]> <lm>     
#>  8     8    10 <tibble [27 × 5]> <lm>     
#>  9     9    11 <tibble [20 × 5]> <lm>     
#> 10    10    12 <tibble [0 × 5]>  <lgl [1]>

models %>%
  filter(!is.na(model)) %>%
  transmute(
    from, to,
    coeff = model %>% map(coefficients),
    r2 = model %>% map_dbl(~ .x %>% summary() %>% pluck("r.squared"))
  ) %>%
  unnest_wider(coeff)

# A tibble: 9 x 7
#   from    to `(Intercept)`    x1    x2      x3    r2
#  <int> <dbl>         <dbl> <dbl> <dbl>   <dbl> <dbl>
#1     1     3         0.601 0.883  2.07 -0.0788 0.970
#2     2     4         0.766 0.965  2.01 -0.141  0.965
#3     3     5         0.879 0.954  1.94 -0.165  0.953

Another way of subseting groups is to use nest:

# get all observations from day 3 to 5
data %>% arrange(date) %>% nest(-date) %>% slice(3:5) %>% unnest()