I am quite new to r and am trying to perform ARIMA time series forecast. The data I am looking into in electricity load per 15 min. My data looks as follows:
day month year PTE periode_van periode_tm gemeten_uitwisseling
1 1 01 2010 1 0 secs 900 secs 2636
2 1 01 2010 2 900 secs 1800 secs 2621
3 1 01 2010 3 1800 secs 2700 secs 2617
4 1 01 2010 4 2700 secs 3600 secs 2600
5 1 01 2010 5 3600 secs 4500 secs 2582
geplande_import geplande_export date weekend
1 719 -284 2010-01-01 00:00:00 0
2 719 -284 2010-01-01 00:15:00 0
3 719 -284 2010-01-01 00:30:00 0
4 719 -284 2010-01-01 00:45:00 0
5 650 -253 2010-01-01 01:00:00 0
weekday Month gu_ma
1 5 01 NA
2 5 01 NA
3 5 01 NA
4 5 01 NA
5 5 01 NA
to create a time series I have used the following code
library("zoo")
ZOO <- zoo(NLData$gemeten_uitwisseling,
order.by=as.POSIXct(NLData$date, format="%Y-%m-%d %H:%M:%S"))
ZOO <- na.approx(ZOO)
tsNLData <- ts(ZOO)
plot(tsNLData)
I have also tried the following
NLDatats <- ts(NLData$gemeten_uitwisseling, frequency = 96)
However when I plot the data I get the following;
How can I solve this?
There doesn't seem to be any problem with your graph, but your data come in 15 minute intervals, and you are plotting 4 years worth of data. So naturally it will look like a dark shaded region because there is no way to show the thousands of data points you have in your series in a single plot.
If you are struggling to handle this much data, you can consider sampling from your data frame before plotting, although this will remove seasonality and autocorrelation from the outcome. That can be helpful if you want to know average values of your outcome over time, but not as helpful to see the seasonal and autocorrelative structure in the data.
See the code below that uses dplyr
and ggplot2
to plot some simulated time series that illustrates these issues. It's always best to start with simulated data and then work with your own data.
require(ggplot2)
require(dplyr)
sim_data <- arima.sim(model=list(ar=.88,order=c(1,0,0)),n=10000,sd=.3)
#Too many points
data_frame(y=as.numeric(sim_data),x=1:10000) %>% ggplot(aes(y=y,x=x)) + geom_line() +
theme_minimal() + xlab('Time') + ylab('Y_t')
#Sample from data (random sample)
#However, this will remove autocorrelation/seasonality
data_frame(y=as.numeric(sim_data),x=1:10000) %>% sample_n(500) %>%
ggplot(aes(y=y,x=x)) + geom_line() + theme_minimal() + xlab('Time') + ylab('Y_t')
# Plot a subset, which preserves autocorrelation and seasonality
data_frame(y=as.numeric(sim_data),x=1:10000) %>% slice(1:300) %>%
ggplot(aes(y=y,x=x)) + geom_line() + theme_minimal() + xlab('Time') + ylab('Y_t')