Search code examples
rtime-seriesxts

dealing with NA in seasonal cycle analysis R


I have a timeseries of monthly data with lots of missing datapoints, set to NA. I want to simply subtract the annual cycle from the data, ignoring the missing entries. It seems that the decompose function can't handle missing data points, but I have seen elsewhere that the seasonal package is suggested instead. However I am also running into problems there too with the NA.

Here is a minimum reproducible example of the problem using a built in dataset...

library(seasonal)

# set range to missing NA in Co2 dataset
c2<-co2
c2[c2>330 & c2<350]=NA
seas(c2,na.action=na.omit)

Error in na.omit.ts(x) : time series contains internal NAs

Yes, I know! that's why I asked you to omit them! Let's try this:

seas(c2,na.action=na.x13)

Error: X-13 run failed

Errors:
- Adding MV1981.Apr exceeds the number of regression effects
  allowed in the model (80).

Hmmm, interesting, no idea what that means, okay, please just exclude the NA:

seas(c2,na.action=na.exclude)

Error in na.omit.ts(x) : time series contains internal NAs

that didn't help much! and for good measure

decompose(c2)

Error in na.omit.ts(x) : time series contains internal NAs

I'm on the following:

R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

Why is leaving out NA such a problem? I'm obviously being completely stupid, but I can't see what I'm doing wrong with the seas function. Happy to consider an alternative solution using xts.


Solution

  • My first solution, simply manually calculating the seasonal cycle, converting to a dataframe to subtract the vector and then transforming back.

    # seasonal cycle
    scycle=tapply(c2,cycle(c2),mean,na.rm=T) 
    # converting to df
    df=tapply(c2, list(year=floor(time(c2)), month = cycle(c2)), c)
    # subtract seasonal cycle
    for (i in 1:nrow(df)){df[i,]=df[i,]-scycle}
    # convert back to timeseries
    anomco2=ts(c(t(df)),start=start(c2),freq=12)
    

    Not very pretty, and not very efficient either.

    The comment of missuse lead me to another Seasonal decompose of monthly data including NA in r I missed with a near duplicate question and this suggested the package zoo, which seems to work really well for additive series

    library(zoo)
    c2=co2
    c2[c2>330&c2<350]=NA
    d=decompose(na.StructTS(c2)) 
    plot(co2)
    lines(d$x,col="red")
    

    shows that the series is very well reconstructed through the missing period.

    black lines shows Co2 series with missing chunk and the red line is the reconstructed series

    The output of deconstruct has the trend and seasonal cycle available. I wish I could transfer my bounty to user https://stackoverflow.com/users/516548/g-grothendieck for this helpful response. Thanks to user missuse too.

    However, if the missing portion is at the end of the series, the software has to extrapolate the trend and has more difficulties. The original series (in black) maintains the trend, while the trend is smaller in the reconstructed series (red):

    c2=co2
    c2[c2>350]=NA
    d=decompose(na.StructTS(c2)) 
    plot(co2)
    lines(d$x,col="red")
    

    extrapolation of data using zoo

    Lastly, if instead the missing portion is at the start of the series, the software is unable to extrapolate backwards in time and throws an error... I feel another SO question coming on...

    c2=co2
    c2[c2<330]=NA
    d=decompose(na.StructTS(c2)) 
    
    Error in StructTS(y) :  
    the first value of the time series must not be missing