I have a dataframe that looks like this:
DAY X1996 X1997
1 1-Jul 98 86
2 2-Jul 97 90
3 3-Jul 97 93
....
I want to end up with a TS object so that I can do HoltWinters smoothing on it. I think I want it to look like this (though I'm not sure because I haven't done HoltWinters before):
Day Year Temp
1-Jul 1996 98
2-Jul 1996 98
3-Jul 1996 98
...
1-Jul 1997 86
2-Jul 1997 90
3-Jul 1997 93
This is what I'm trying to do:
df <- read.delim("temps.txt")
myts <- as.ts(df)
But this doesn't look close to what I'll need to do a Holtwinters model. I've looked all over stackoverflow and the docs for TS and Zoo and I'm stuck on how to create this TS object. A push in the right direction will be much appreciated.
ts objects are normally used with monthly, quarterly or annual data, not daily data; however, if we remove Feb 29th then we can create a ts object whose times are the year plus a fraction 0/365, 1/365, ..., 364/365 which will be regularly spaced if there are no missing dates. The key point is that if the seasonality is based on a year then we must have the same number of points in each year to represent it as a ts object.
First convert to a zoo object z0 having an ordinary Date, remove Feb 29th giving z, create the time index described above in a zoo object zz and then convert that to ts.
library(data.table)
library(lubridate)
library(zoo)
m <- melt(as.data.table(df), id.vars = 1)
z0 <- with(m, zoo(value, as.Date(paste(variable, DAY), "X%Y %d-%b")))
z <- z0[! (month(time(z)) == 2 & day(time(z)) == 29)]
tt <- time(z)
zz <- zoo(coredata(z), year(tt) + (yday(tt) - ((month(tt) > 2) & leap_year(tt)) - 1)/365)
as.ts(zz)
Above we removed Feb 29th in leap years but an alternate approach would be to remove Dec 31st in leap years giving slightly simpler code which avoids the need to use leap_year as we can simply remove any day for which yday is 366. z0 is from above.
zz0 <- z0[yday(time(z0)) <= 365]
tt <- time(zz0)
zz <- zoo(coredata(zz0), year(tt) + (yday(tt) - 1) / 365)
as.ts(zz)
Another approach would to reduce the data to monthly data. Then it is relatively straightforward since ts has facilities to represent monthly data. Below we used the last point in each month but we could use the mean value or other scalar summary if desired.
ag <- aggregate(z0, as.yearmon, tail, 1) # use last point in each month
as.ts(ag)
df in the question made into a reproducible form is the following (however, we would need to fill it out with more data to avoid generating a ts object with many NAs).
df <- structure(list(DAY = structure(1:3, .Label = c("1-Jul", "2-Jul",
"3-Jul"), class = "factor"), X1996 = c(98L, 97L, 97L), X1997 = c(86L,
90L, 93L)), class = "data.frame", row.names = c("1", "2", "3"
))