Search code examples
rdatelapplynumericxts

Split, lapply, rbind paradigm. lapply returning lists of numerics instead of date index


I'm practicing time-series analysis on the Red Sox seasons dataset. I need to split the dataset year by year and do some calculation, so I'm pretty sure I need to use the split, lapply, rbind paradigm. I'm feeding an xts binary (win/loss) column to the split function, so far so good, it returns a list of xts's correctly split by year.

Then I ran lapply on this list to calculate a cumulative mean of win/loss across each year, the numeric result is okay, but it is converting the xts objects to numeric vectors, so I lose my Date index.

What might be the source of this issue?

thank you!

head of red_sox_xts$win.

            win
2010-04-04   1
2010-04-06   0
2010-04-07   0
2010-04-09   0
2010-04-10   1
2010-04-11   1

1 - feeding it to this function to split by year.

red_sox_seasons <- split(red_sox_xts$win, f = 'years')

output:

[[1]]
            win
2010-04-04   1
2010-04-06   0
     .       .
     .       .
     .       .
[[2]]
            win
2011-04-01   0
2011-04-02   0
     .       .
     .       .
     .       .

2 - Next I feed this output to the lapply function.

red_sox_ytd <- lapply(red_sox_seasons, cummean)

output: (This is where the strange behavior begins)

1.   A.1
     B.0.5
      .
      .
      .
2.   A.0
     B.0.5
      .
      .
      .

class(red_sox_ytd) is a list class(red_sox_ytd[[1]]) is numeric while it should be xts

This makes me unable to perform the next step correctly:

do.call(rbind, red_sox_ytd)

Solution

  • Assuming x shown in the Note at the end we can calculate the cummean by year using ave:

    transform(x, cummean = ave(win, format(time(x), "%Y"), FUN = cummean))
    ##            win   cummean
    ## 2010-04-04   1 1.0000000
    ## 2010-04-06   0 0.5000000
    ## 2010-04-07   0 0.3333333
    ## 2010-04-09   0 0.2500000
    ## 2010-04-10   1 0.4000000
    ## 2010-04-11   1 0.5000000
    

    Another approach (but longer) is:

    do.call("rbind", lapply(split(x, "years"), transform, cummean = cummean(win)))
    

    Note

    Lines <- "date win
    2010-04-04   1
    2010-04-06   0
    2010-04-07   0
    2010-04-09   0
    2010-04-10   1
    2010-04-11   1"
    library(xts)
    x <- as.xts(read.zoo(text = Lines, header = TRUE, drop = FALSE))