I'm working on a data-munging pipeline with a lot of date columns in the data. Many R functions (e.g., set operations, sapply
, etc.) do no preserve date class, converting the dates to integers.
The strategies I see to deal with this are:
as.Date()
is often tedious.by = "month"
) in the intermediate munging steps impossible without first converting to date.Any other options I'm missing? Is there a way to make R play nice with dates? To clarify, the data I'm dealing with is not just a time series: multiple columns contain dates. So, as far as I can tell, xts
is of limited usefulness.
The does not preserve Date
class misfeature is an artefact of R itself, and how some base R functions are implemented. See e.g.
R> dates <- Sys.Date() + 0:2
R> for (d in dates) cat(d, "\n")
17532
17533
17534
R>
Essentially, the S3 class attributes gets dropped when you do certain vector operations:
R> as.vector(dates)
[1] 17532 17533 17534
R>
So my recommendation is to pick a good container type you like and stick with it to do the operations there. I like data.table a lot for this. A quick example:
R> suppressMessages(library(data.table))
R> dt <- data.table(date=Sys.Date()+0:2, other=Sys.Date() + cumsum(runif(3)*100))
R> dt[, diff:=other-date][]
date other diff
1: 2018-01-01 2018-03-30 88.88445 days
2: 2018-01-02 2018-06-09 158.23913 days
3: 2018-01-03 2018-07-30 208.62187 days
R> dt[, month:=month(other)][]
date other diff month
1: 2018-01-01 2018-03-30 88.88445 days 3
2: 2018-01-02 2018-06-09 158.23913 days 6
3: 2018-01-03 2018-07-30 208.62187 days 7
R>
Not only does the Date
type persist (as evidenced by the difference operation returning a difftime
object), but you also gets lots of helper
functions (like month()
) here. Grouping by date is also natural.