Search code examples
rholtwinters

Converting a data frame into TS object in R


I have a dataframe that looks like this:

  DAY     X1996 X1997 
1 1-Jul    98    86   
2 2-Jul    97    90   
3 3-Jul    97    93   
....

I want to end up with a TS object so that I can do HoltWinters smoothing on it. I think I want it to look like this (though I'm not sure because I haven't done HoltWinters before):

Day    Year   Temp
1-Jul  1996   98
2-Jul  1996   98
3-Jul  1996   98
...
1-Jul  1997   86
2-Jul  1997   90
3-Jul  1997   93

This is what I'm trying to do:

df <- read.delim("temps.txt")
myts <- as.ts(df)

But this doesn't look close to what I'll need to do a Holtwinters model. I've looked all over stackoverflow and the docs for TS and Zoo and I'm stuck on how to create this TS object. A push in the right direction will be much appreciated.


Solution

  • ts objects are normally used with monthly, quarterly or annual data, not daily data; however, if we remove Feb 29th then we can create a ts object whose times are the year plus a fraction 0/365, 1/365, ..., 364/365 which will be regularly spaced if there are no missing dates. The key point is that if the seasonality is based on a year then we must have the same number of points in each year to represent it as a ts object.

    First convert to a zoo object z0 having an ordinary Date, remove Feb 29th giving z, create the time index described above in a zoo object zz and then convert that to ts.

    library(data.table)
    library(lubridate)
    library(zoo)
    
    m <- melt(as.data.table(df), id.vars = 1)
    z0 <- with(m, zoo(value, as.Date(paste(variable, DAY), "X%Y %d-%b")))
    z <- z0[! (month(time(z)) == 2 & day(time(z)) == 29)]  
    
    tt <- time(z)
    zz <- zoo(coredata(z), year(tt) + (yday(tt) - ((month(tt) > 2) & leap_year(tt)) - 1)/365)
    as.ts(zz)
    

    Remove Dec 31 in leap years

    Above we removed Feb 29th in leap years but an alternate approach would be to remove Dec 31st in leap years giving slightly simpler code which avoids the need to use leap_year as we can simply remove any day for which yday is 366. z0 is from above.

    zz0 <- z0[yday(time(z0)) <= 365]
    tt <- time(zz0)
    zz <- zoo(coredata(zz0), year(tt) + (yday(tt) - 1) / 365)
    as.ts(zz)
    

    Aggregating to Monthly

    Another approach would to reduce the data to monthly data. Then it is relatively straightforward since ts has facilities to represent monthly data. Below we used the last point in each month but we could use the mean value or other scalar summary if desired.

    ag <- aggregate(z0, as.yearmon, tail, 1)  # use last point in each month
    as.ts(ag)
    

    Note

    df in the question made into a reproducible form is the following (however, we would need to fill it out with more data to avoid generating a ts object with many NAs).

    df <- structure(list(DAY = structure(1:3, .Label = c("1-Jul", "2-Jul", 
    "3-Jul"), class = "factor"), X1996 = c(98L, 97L, 97L), X1997 = c(86L, 
    90L, 93L)), class = "data.frame", row.names = c("1", "2", "3"
    ))