Search code examples
rtime-seriesforecastingtimeserieschart

Time series graph does not show a fluid line


I am quite new to r and am trying to perform ARIMA time series forecast. The data I am looking into in electricity load per 15 min. My data looks as follows:

 day month year PTE periode_van periode_tm gemeten_uitwisseling
 1   1    01 2010   1      0 secs   900 secs                 2636
 2   1    01 2010   2    900 secs  1800 secs                 2621
 3   1    01 2010   3   1800 secs  2700 secs                 2617
 4   1    01 2010   4   2700 secs  3600 secs                 2600
 5   1    01 2010   5   3600 secs  4500 secs                 2582
 geplande_import geplande_export                date weekend
 1             719            -284 2010-01-01 00:00:00       0
 2             719            -284 2010-01-01 00:15:00       0
 3             719            -284 2010-01-01 00:30:00       0
 4             719            -284 2010-01-01 00:45:00       0
 5             650            -253 2010-01-01 01:00:00       0
 weekday Month gu_ma
 1       5    01    NA
 2       5    01    NA
 3       5    01    NA
 4       5    01    NA
 5       5    01    NA

to create a time series I have used the following code

library("zoo")
ZOO <- zoo(NLData$gemeten_uitwisseling, 
order.by=as.POSIXct(NLData$date, format="%Y-%m-%d %H:%M:%S"))

ZOO <- na.approx(ZOO)
tsNLData <- ts(ZOO)

plot(tsNLData)

I have also tried the following

NLDatats <- ts(NLData$gemeten_uitwisseling, frequency = 96)

However when I plot the data I get the following;

Time Series plot

How can I solve this?


Solution

  • There doesn't seem to be any problem with your graph, but your data come in 15 minute intervals, and you are plotting 4 years worth of data. So naturally it will look like a dark shaded region because there is no way to show the thousands of data points you have in your series in a single plot.

    If you are struggling to handle this much data, you can consider sampling from your data frame before plotting, although this will remove seasonality and autocorrelation from the outcome. That can be helpful if you want to know average values of your outcome over time, but not as helpful to see the seasonal and autocorrelative structure in the data.

    See the code below that uses dplyr and ggplot2 to plot some simulated time series that illustrates these issues. It's always best to start with simulated data and then work with your own data.

    require(ggplot2)
    require(dplyr)
    
    sim_data <- arima.sim(model=list(ar=.88,order=c(1,0,0)),n=10000,sd=.3)
    
    #Too many points
    data_frame(y=as.numeric(sim_data),x=1:10000) %>% ggplot(aes(y=y,x=x)) + geom_line() + 
      theme_minimal() + xlab('Time') + ylab('Y_t')
    
    
    #Sample from data (random sample)
    #However, this will remove autocorrelation/seasonality
    data_frame(y=as.numeric(sim_data),x=1:10000) %>% sample_n(500) %>% 
      ggplot(aes(y=y,x=x)) + geom_line() + theme_minimal() + xlab('Time') + ylab('Y_t')
    
    
    # Plot a subset, which preserves autocorrelation and seasonality
    data_frame(y=as.numeric(sim_data),x=1:10000) %>% slice(1:300) %>% 
      ggplot(aes(y=y,x=x)) + geom_line() + theme_minimal() + xlab('Time') + ylab('Y_t')