I'm practicing time-series analysis on the Red Sox seasons dataset. I need to split the dataset year by year and do some calculation, so I'm pretty sure I need to use the split, lapply, rbind paradigm. I'm feeding an xts binary (win/loss) column to the split function, so far so good, it returns a list of xts's correctly split by year.
Then I ran lapply on this list to calculate a cumulative mean of win/loss across each year, the numeric result is okay, but it is converting the xts objects to numeric vectors, so I lose my Date index.
What might be the source of this issue?
thank you!
head of red_sox_xts$win.
win
2010-04-04 1
2010-04-06 0
2010-04-07 0
2010-04-09 0
2010-04-10 1
2010-04-11 1
1 - feeding it to this function to split by year.
red_sox_seasons <- split(red_sox_xts$win, f = 'years')
output:
[[1]]
win
2010-04-04 1
2010-04-06 0
. .
. .
. .
[[2]]
win
2011-04-01 0
2011-04-02 0
. .
. .
. .
2 - Next I feed this output to the lapply function.
red_sox_ytd <- lapply(red_sox_seasons, cummean)
output: (This is where the strange behavior begins)
1. A.1
B.0.5
.
.
.
2. A.0
B.0.5
.
.
.
class(red_sox_ytd) is a list class(red_sox_ytd[[1]]) is numeric while it should be xts
This makes me unable to perform the next step correctly:
do.call(rbind, red_sox_ytd)
Assuming x
shown in the Note at the end we can calculate the cummean
by year using ave
:
transform(x, cummean = ave(win, format(time(x), "%Y"), FUN = cummean))
## win cummean
## 2010-04-04 1 1.0000000
## 2010-04-06 0 0.5000000
## 2010-04-07 0 0.3333333
## 2010-04-09 0 0.2500000
## 2010-04-10 1 0.4000000
## 2010-04-11 1 0.5000000
Another approach (but longer) is:
do.call("rbind", lapply(split(x, "years"), transform, cummean = cummean(win)))
Lines <- "date win
2010-04-04 1
2010-04-06 0
2010-04-07 0
2010-04-09 0
2010-04-10 1
2010-04-11 1"
library(xts)
x <- as.xts(read.zoo(text = Lines, header = TRUE, drop = FALSE))