Search code examples
rextractlapplyrollapply

Using lapply and the lm function together in R


I have a df as follows:

   t         r
1   0 100.00000
2   1 135.86780
3   2 149.97868
4   3 133.77316
5   4  97.08129
6   5  62.15988
7   6  50.19177

and so on...

I want to apply a rolling regression using lm(r~t).

However, I want to estimate one model for each iteration, where the iterations occur over a set time window t+k. Essentially, the first model should be estimated with t=0,t=1,...t=5, if k = 5, and the second model estimated with t=1, t=2,...,t=6, and so on.

In other words, it iterates from a starting point with a set window t+k where k is some pre-specified window length and applies the lm function over that particular window length iteratively.

I have tried using lapply like this:

mdls = lapply(df, function(x) lm(r[x,]~t))

However, I got the following error:

Error in r[x, ] : incorrect number of dimensions

If I remove the [x,], each iteration gives me the same model, in other words using all the observations.

If I use rollapply:

coefs = rollapply(df, 3, FUN = function(x) coef(lm(r~t, data = 
as.data.frame(x))), by.column = FALSE, align = "right")

res = rollapply(df, 3, FUN = function(z) residuals(lm(r~t, data = 
as.data.frame(z))), by.column = FALSE, align = "right")

Where:

 t = seq(0,15,1)
 r = (100+50*sin(0.8*t))
 df = as.data.frame(t,r)

I get 15 models, but they are all estimated over the entire dataset, providing the same intercepts and coefficients. This is strange as I managed to make rollapply work just before testing it in a new script. For some reason it does not work again, so I am perplexed as to whether R is playing tricks on me, or whether there is something wrong with my code.

How can I adjust these methods to make sure they iterate according to my wishes?


Solution

  • I enclose a possible solution. The idea is to use a vector 1: nrow (df) in the function rollapply to indicate which rows we want to select.

    df = data.frame(t = 0:6, r = c(100.00000, 135.86780, 149.97868, 133.77316, 97.08129, 62.15988, 50.19177))
    N = nrow(df)
    
    require(zoo)
    
    # Coefficients
    coefs <- rollapply(data = 1:N, width = 3, FUN = function(x){
    
      r = df$r[x]
      t = df$t[x]
    
      out <- coef(lm(r~t))
    
      return(out)
    
    })
    
    # Residuals
    res <- rollapply(data = 1:N, width = 3, FUN = function(x){
    
      r = df$r[x]
      t = df$t[x]
    
      out <- residuals(lm(r~t))
    
      return(out)
    
    })