I am a beginner in machine learning for time series, I need to develop a project, where my data is composed of minutes, could someone help me create this algorithm?
Data set: Each value represents one minute of collection (9:00, 9:01 ...), the collection lasts 10 minutes and was performed in 2 months, that is, 10 values for January and 10 values for the month of February.
Objective: I would like my result to be a forecast of the next 10 minutes for month of March, example:
2020-03-01 9:00:00
2020-03-01 9:01:00
2020-03-01 9:02:00
2020-03-01 9:03:00
Training: The training must contain the month of January and February as a reference for forecasting, taking into account that it is a time series
Seasonal:
Forecast:
Current problem: it seems that the current forecast is failing, the previous data does not seem to be valid as a time series, because, as can be seen in the seasonality image, the data set is shown as a straight line. The forecast is represented by the green line in the figure below, and the original data by the blue line, however as we see the date axis is going until 2020-11-01, it should go until 2020-03-01, in addition the original data form a rectangle in the graph
script.py
# -*- coding: utf-8 -*-
try:
import pandas as pd
import numpy as np
import pmdarima as pm
#%matplotlib inline
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse
except ImportError as e:
print("[FAILED] {}".format(e))
class operationsArima():
@staticmethod
def ForecastingWithArima():
try:
# Import
data = pd.read_csv('minute.csv', parse_dates=['date'], index_col='date')
# Plot
fig, axes = plt.subplots(2, 1, figsize=(10,5), dpi=100, sharex=True)
# Usual Differencing
axes[0].plot(data[:], label='Original Series')
axes[0].plot(data[:].diff(1), label='Usual Differencing')
axes[0].set_title('Usual Differencing')
axes[0].legend(loc='upper left', fontsize=10)
print("[OK] Generated axes")
# Seasonal
axes[1].plot(data[:], label='Original Series')
axes[1].plot(data[:].diff(11), label='Seasonal Differencing', color='green')
axes[1].set_title('Seasonal Differencing')
plt.legend(loc='upper left', fontsize=10)
plt.suptitle('Drug Sales', fontsize=16)
plt.show()
# Seasonal - fit stepwise auto-ARIMA
smodel = pm.auto_arima(data, start_p=1, start_q=1,
test='adf',
max_p=3, max_q=3, m=11,
start_P=0, seasonal=True,
d=None, D=1, trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
smodel.summary()
print(smodel.summary())
print("[OK] Generated model")
# Forecast
n_periods = 11
fitted, confint = smodel.predict(n_periods=n_periods, return_conf_int=True)
index_of_fc = pd.date_range(data.index[-1], periods = n_periods, freq='MS')
# make series for plotting purpose
fitted_series = pd.Series(fitted, index=index_of_fc)
lower_series = pd.Series(confint[:, 0], index=index_of_fc)
upper_series = pd.Series(confint[:, 1], index=index_of_fc)
print("[OK] Generated series")
# Plot
plt.plot(data)
plt.plot(fitted_series, color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
plt.title("ARIMA - Final Forecast - Drug Sales")
plt.show()
print("[SUCESS] Generated forecast")
except Exception as e:
print("[FAILED] Caused by: {}".format(e))
if __name__ == "__main__":
flow = operationsArima()
flow.ForecastingWithArima() # Init script
Sumary:
SARIMAX Results
================================================================================
Dep. Variable: y No. Observations: 22
Model: SARIMAX(0, 1, 0, 11) Log Likelihood nan
Date: Mon, 13 Apr 2020 AIC nan
Time: 21:19:10 BIC nan
Sample: 0 HQIC nan
- 22
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept 0 5.33e-13 0 1.000 -1.05e-12 1.05e-12
sigma2 1e-10 5.81e-10 0.172 0.863 -1.04e-09 1.24e-09
===================================================================================
Ljung-Box (Q): nan Jarque-Bera (JB): nan
Prob(Q): nan Prob(JB): nan
Heteroskedasticity (H): nan Skew: nan
Prob(H) (two-sided): nan Kurtosis: nan
===================================================================================
I see a couple of problems here: As you have two short 1-minute frequency time series with a month separation, it is normal to observe the straight line in your blue line that you mention. In addition, the green line looks like the original data itself, what means that the model's forecast is exactly the same as your original data.
Finally, I don't think it's a good idea to stick together two separate time-series...