Search code examples
rtime-seriesvarforecastingarima

Time series with missing weekend value and keep date in plot


I have 1241 daily data from 2012-11-19 to 2017-10-16 but only for week day (for the number of service in a cafeteria). I'm trying to do to prediction,but I have trouble initializing my time series:

timeseries = ts(passage, frequency = 365,
   start = c(2012, as.numeric(format(as.Date("2012-11-19"), "%j"))),
   end = c(2017, as.numeric(format(as.Date("2017-10-16"), "%j"))) )

If I do like that, because of missing weekend, my variable will loop back after getting to 1241, all the way to 1791 (which correspond to the number of day between my 2 date) and if I want to make a train time series, choosing a date with the parameter "end" will make it not corresponding to the actual date's data.

So I can I overcome this problem? I know that I can create my time series directly with ( and I'm choosing the right frequency ?, if I put 5 or 7 the axis go into very far years)

timeseries = ts(passage, frequency = 365)

but I loose the ability to choose a start and en date and can't see that information in a plot

Edit: The reason I want to keep it to weekly data with 5 day is so when I plot the forecast, I don't get lots of zero in the plot

plot(forecast(timeseries_00))

like this plot


Solution

  • if I understand your problem correctly, this one could be a solution:

    Step 1) I create a time series (passage) with length 1241 like yours.

    passage<-rep(1:1241)
    

    "passage" time series

    Step 2) I convert the time series in a matrix where every single column is a working day (adding 4 zeros because the time series end at monday), after that I add two additional columns to the matrix with zero values (Saturday and Sunday), I come back to a time series using function unmatrix (package gdata) and I delete the last 6 zeros (4 added by myself and 2 coming from Sunday and Saturday columns)

    passage_matrix<-cbind(t(matrix(c(passage,c(0,0,0,0)),nrow = 5)),0,0)
    library(gdata)
    passage_00<-as.numeric(unmatrix( passage_matrix  ,byrow=T))
    passage_00<-passage_00[1:(length(passage_00)-6)]
    

    Step 3) I create my new time series

    timeseries_00 = ts(passage_00, 
                       frequency = 365,
                       start = c(2012, as.numeric(format(as.Date("2012-11-19"), 
                       "%j"))))
    

    Step 4) Now I'm able to plot the time series with correct date label (just for working days in my exemple below)

    date<-seq(from=as.Date("2012-11-19"),by=1,length.out=length(timeseries_00))
    plot(timeseries_00[timeseries_00>0],axes=F)
    axis(1, at=1:length(timeseries_00[timeseries_00>0]), labels=date[timeseries_00>0])
    

    "passage" time series with right date

    Step 4) Forecast the time series

    for_00<-forecast(timeseries_00)
    

    Step 5) I have to modify my original time series in order to have same length beetween forecast data and original data

    length(for_00$mean) #length of the prediction 
    passage_00extended<-c(passage_00,rep(0,730)) #Add zeros for future date
    timeseries_00extended = ts(passage_00extended, frequency = 365,
                       start = c(2012, as.numeric(format(as.Date("2012-11-19"), "%j"))))
    date<-seq(from=as.Date("2012-11-19"),by=1,length.out=length(timeseries_00extended))
    

    Step 6) I have to modify predicted data in order to have the same length of timeseries_00extended, all fake data (0 values) are changed in "NA"

    pred_mean<-c(rep(NA,length(passage_00)),for_00$mean) #Prediction mean
    pred_upper<-c(rep(NA,length(passage_00)),for_00$upper[,2]) #Upper 95%
    pred_lower<-c(rep(NA,length(passage_00)),for_00$lower[,2]) #Lower 95%
    passage_00extended[passage_00extended==0]<-rep(NA,sum(passage_00extended==0))
    

    Step 7) I plot original data (passage_00extended) and predictions on the same plot (with different colours for mean [blue] and upper and lower bound [orange])

    plot(passage_00extended,axes=F,ylim=c(1,max(pred_upper[!is.na(pred_upper)])))
    lines(pred_mean,col="Blue")
    lines(pred_upper,col="orange")
    lines(pred_lower,col="orange")
    axis(1, at=1:length(timeseries_00extended), labels=date)
    

    Plot: Forecast