Search code examples
pythonmachine-learningtime-seriesarimaforecast

Prediction of time series with minute interval using arima for python


I am a beginner in machine learning for time series, I need to develop a project, where my data is composed of minutes, could someone help me create this algorithm?

Data set: Each value represents one minute of collection (9:00, 9:01 ...), the collection lasts 10 minutes and was performed in 2 months, that is, 10 values ​​for January and 10 values for the month of February.

enter image description here

Complete data

Objective: I would like my result to be a forecast of the next 10 minutes for month of March, example:

2020-03-01 9:00:00
2020-03-01 9:01:00
2020-03-01 9:02:00
2020-03-01 9:03:00

Training: The training must contain the month of January and February as a reference for forecasting, taking into account that it is a time series

Seasonal:

enter image description here

Forecast:

Current problem: it seems that the current forecast is failing, the previous data does not seem to be valid as a time series, because, as can be seen in the seasonality image, the data set is shown as a straight line. The forecast is represented by the green line in the figure below, and the original data by the blue line, however as we see the date axis is going until 2020-11-01, it should go until 2020-03-01, in addition the original data form a rectangle in the graph

enter image description here

script.py

# -*- coding: utf-8 -*-

try:
    import pandas as pd
    import numpy as np
    import pmdarima as pm
    #%matplotlib inline
    import matplotlib.pyplot as plt
    from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
    from statsmodels.tsa.arima_model import ARIMA
    from statsmodels.tsa.seasonal import seasonal_decompose
    from dateutil.parser import parse
except ImportError as e:
    print("[FAILED] {}".format(e))

class operationsArima():

    @staticmethod
    def ForecastingWithArima():

        try:

            # Import
            data = pd.read_csv('minute.csv', parse_dates=['date'], index_col='date')

            # Plot
            fig, axes = plt.subplots(2, 1, figsize=(10,5), dpi=100, sharex=True)

            # Usual Differencing
            axes[0].plot(data[:], label='Original Series')
            axes[0].plot(data[:].diff(1), label='Usual Differencing')
            axes[0].set_title('Usual Differencing')
            axes[0].legend(loc='upper left', fontsize=10)
            print("[OK] Generated axes")

            # Seasonal
            axes[1].plot(data[:], label='Original Series')
            axes[1].plot(data[:].diff(11), label='Seasonal Differencing', color='green')
            axes[1].set_title('Seasonal Differencing')
            plt.legend(loc='upper left', fontsize=10)
            plt.suptitle('Drug Sales', fontsize=16)
            plt.show()

            # Seasonal - fit stepwise auto-ARIMA
            smodel = pm.auto_arima(data, start_p=1, start_q=1,
                                    test='adf',
                                    max_p=3, max_q=3, m=11,
                                    start_P=0, seasonal=True,
                                    d=None, D=1, trace=True,
                                    error_action='ignore',
                                    suppress_warnings=True,
                                    stepwise=True)

            smodel.summary()
            print(smodel.summary())
            print("[OK] Generated model")

            # Forecast
            n_periods = 11
            fitted, confint = smodel.predict(n_periods=n_periods, return_conf_int=True)
            index_of_fc = pd.date_range(data.index[-1], periods = n_periods, freq='MS')

            # make series for plotting purpose
            fitted_series = pd.Series(fitted, index=index_of_fc)
            lower_series = pd.Series(confint[:, 0], index=index_of_fc)
            upper_series = pd.Series(confint[:, 1], index=index_of_fc)
            print("[OK] Generated series")

            # Plot
            plt.plot(data)
            plt.plot(fitted_series, color='darkgreen')
            plt.fill_between(lower_series.index,
                            lower_series,
                            upper_series,
                            color='k', alpha=.15)

            plt.title("ARIMA - Final Forecast - Drug Sales")
            plt.show()
            print("[SUCESS] Generated forecast")

        except Exception as e:

            print("[FAILED] Caused by: {}".format(e))

if __name__ == "__main__":
    flow = operationsArima()
    flow.ForecastingWithArima() # Init script

Sumary:

                                SARIMAX Results                                 
================================================================================
Dep. Variable:                        y   No. Observations:                   22
Model:             SARIMAX(0, 1, 0, 11)   Log Likelihood                     nan
Date:                  Mon, 13 Apr 2020   AIC                                nan
Time:                          21:19:10   BIC                                nan
Sample:                               0   HQIC                               nan
                                   - 22                                         
Covariance Type:                    opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept           0   5.33e-13          0      1.000   -1.05e-12    1.05e-12
sigma2          1e-10   5.81e-10      0.172      0.863   -1.04e-09    1.24e-09
===================================================================================
Ljung-Box (Q):                         nan   Jarque-Bera (JB):                  nan
Prob(Q):                               nan   Prob(JB):                          nan
Heteroskedasticity (H):                nan   Skew:                              nan
Prob(H) (two-sided):                   nan   Kurtosis:                          nan
===================================================================================

Solution

  • I see a couple of problems here: As you have two short 1-minute frequency time series with a month separation, it is normal to observe the straight line in your blue line that you mention. In addition, the green line looks like the original data itself, what means that the model's forecast is exactly the same as your original data.

    Finally, I don't think it's a good idea to stick together two separate time-series...