Search code examples
pythonmatplotlibkerasfinancequandl

Graphing The Results Of A Keras Stock Market Predictive Neural Network


I have recently attempted to complete a neural network to predict fluctuations within the prices of individual stocks on the stock market, utilising Keras as the framework for the neural network and Quandl as the database for the retrieval of historical stock prices; the code for this program was completed within the Google Colaboratory integrated development environment and the program is displayed below:

import tensorflow as tf
import keras
import numpy as np
import quandl
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import pandas as pd

df = quandl.get("WIKI/FB", api_key = '_msxC6xspj2ddytz7-4u')

print(df)

df = df.reset_index()
df = df[['Adj. Close', 'Date']]

forecast_out = 1
df['Prediction'] = df[['Adj. Close']].shift(-(forecast_out))

X = np.array(df.drop(['Prediction'], 1))
X = X[:-forecast_out]
y = np.array(df['Prediction'])
y = y[:-forecast_out]

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.2) 
model = keras.models.Sequential()
model.add(keras.layers.Dense(units = 64, activation = 'relu'))
model.add(keras.layers.Dense(units = 1, activation = 'linear'))

model.compile(loss='mean_absolute_error',
              optimizer='adam',
              metrics=['accuracy'])

History = model.fit(x_train, y_train, epochs=8)

prediction = model.predict(x_test)

My primary inquiry is concerning the existence of a graphing mechanism for the aforementioned data, allowing one to display the x_test module upon the same graph as the prediction for that particular dataset; due to my small quantity of experience within this application of Python, I attempted to graph the dataset as provided, via the commands:

plt.plot(x_test)
plt.plot(prediction)

However, this produced the following graph:

enter image description here

The primary aim of the program is to produce a system which possesses the capability to predict any provided set of prices from a particular time period for a particular stock; as such, it is necessary to produce results similarly to those displayed at the final section of the article shown below:

https://towardsdatascience.com/neural-networks-to-predict-the-market-c4861b649371

A graph similar to that displayed would allow a more transparent analysis of the efficacy of the program; my inquiry is directed towards an effort to produce a graph similar to that displayed within the above article. Would there remain a method to produce such a graph or to allow for the observation of concrete results such as this? Thank you for your assistance.


Solution

  • An important thing to note is that your train and test data will necessarily be on separate parts of the x-axis.

    For instance, suppose that the training set consists of 100 observations, and the test set of 15 observations. The test set is the latter part of the time series that the model is being used to predict (i.e. the model that is built using the training set).

    Consider an example of using LSTM to predict fluctuations in weekly hotel cancellations.

    The training and validation predictions are generated using MinMaxScaler to allow the neural network to interpret the data properly. From what I can see, you have not performed this step in your example. You should do so, as your results are highly likely to be erroneous otherwise - your data is not to a common scale and therefore the LSTM model cannot interpret it properly.

    # Generate predictions
    trainpred = model.predict(X_train)
    valpred = model.predict(X_val)
    
    In [30]:
    
    trainpred
    
    Out[30]:
    
    array([[0.32363528],
           [0.3715328 ],
           [0.46051228],
           [0.35137814],
           [0.38220662],
           [0.41239697],
           [0.3573438 ],
           [0.43657327],
           [0.47494155],
           [0.467317  ],
           [0.49233937],
           [0.49879026],
           [0.39996487],
           [0.38200712],
           [0.3309482 ],
           [0.21176702],
           [0.22578238],
           [0.18523258],
           [0.23222469],
           [0.26659006],
           [0.2368085 ],
           [0.22137557],
           [0.28356454],
           [0.16753006],
           [0.16966385],
           [0.22060908],
           [0.1916717 ],
           [0.2181809 ],
           [0.21772115],
           [0.24777801],
           [0.3288507 ],
           [0.30944437],
           [0.33784014],
           [0.37927932],
           [0.31557906],
           [0.43595707],
           [0.3505273 ],
           [0.4064384 ],
           [0.48314226],
           [0.41506904],
           [0.48799258],
           [0.4533432 ],
           [0.45297146],
           [0.46697432],
           [0.41320056],
           [0.45331544],
           [0.48461175],
           [0.50513804],
           [0.50340337],
           [0.44235045],
           [0.48495632],
           [0.32804203],
           [0.38383847],
           [0.3502031 ],
           [0.34179717],
           [0.37928385],
           [0.3852548 ],
           [0.3978842 ],
           [0.41324353],
           [0.42388642],
           [0.43424374],
           [0.4359951 ],
           [0.49112016],
           [0.49098223],
           [0.50581044],
           [0.5686604 ],
           [0.48814237],
           [0.5679423 ],
           [0.519874  ],
           [0.42899352],
           [0.4314267 ],
           [0.3878218 ],
           [0.3585053 ],
           [0.31897143]], dtype=float32)
    
    In [31]:
    
    valpred
    
    Out[31]:
    
    array([[0.374565  ],
           [0.311441  ],
           [0.37602562],
           [0.36187553],
           [0.35613692],
           [0.399751  ],
           [0.40736055],
           [0.41798282],
           [0.36257237],
           [0.4636013 ],
           [0.47177172],
           [0.45880812],
           [0.5725181 ],
           [0.5696718 ]], dtype=float32)
    

    The predictions are converted back to normal values:

    # Convert predictions back to normal values
    trainpred = scaler.inverse_transform(trainpred)
    Y_train = scaler.inverse_transform([Y_train])
    valpred = scaler.inverse_transform(valpred)
    Y_val = scaler.inverse_transform([Y_val])
    predictions = valpred
    

    The predictions are then plotted:

    
    In [34]:
    
    # Train predictions
    trainpredPlot = np.empty_like(df)
    trainpredPlot[:, :] = np.nan
    trainpredPlot[previous:len(trainpred)+previous, :] = trainpred
    
    In [35]:
    
    # Validation predictions
    valpredPlot = np.empty_like(df)
    valpredPlot[:, :] = np.nan
    valpredPlot[len(trainpred)+(previous*2)+1:len(df)-1, :] = valpred
    
    In [36]:
    
    # Plot all predictions
    inversetransform, =plt.plot(scaler.inverse_transform(df))
    trainpred, =plt.plot(trainpredPlot)
    valpred, =plt.plot(valpredPlot)
    plt.xlabel('Number of weeks')
    plt.ylabel('Cancellations')
    plt.title("Predicted vs. Actual Cancellations Per Week")
    plt.show()
    

    The graph now displays as follows:

    hotel

    Two points in summary:

    1. Ensure that when the real data is plotted - the training and test predictions are not overlapping. This is erroneous, as training and test predictions refer to two different sets of predictions.

    2. Scale your data before feeding into LSTM - the neural network will otherwise not know how to interpret such data and any results will be very superficial.