machine-learning keras lstm recurrent-neural-network tflearn

LSTM and labels

Lets start off with "I know ML cannot predict stock markets better than monkeys." But I just want to go through with it.

My question is a theretical one. Say I have date, open, high, low, close as columns. So I guess I have 4 features, open, high, low, close.

'my_close' is going to be my label(answer) and I will use the 'close' 7 days from current row. Basically i shift the 'close' column up 7 rows and make it a new column called 'my_close'.

LSTMs work on sequences. So say the sequence I set is 20 days. hence my shape will be (1000days of data, 20 day as a sequence, 3 features).

The problem that is bothering me is should these 20 days or rows of data, have the exact same label? or can they have individual labels ? Or have i misunderstood the whole theory?

Thanks guys.

Solution

In your case, You want to predict the current day's stock price using previous 7 days stock values. The way your building your inputs and outputs require some modification before feeding into the model.

Your making mistake in understanding timesteps(in your sequences). Timesteps(sequences) in layman terms is the total number of inputs we will consider while predicting the output. In your case, it will be 7(not 20) as we will be using previous 7 days data to predict the current day's output.

Your Input should be previous 7 days of info

[F11,F12,F13],[F21,F22,F23],........,[F71,F72,F73]

Fij in this, F represents the feature, i represents timestep and j represents feature number.

and the output will be the stock price of the 8th day. Here your model will analyze previous 7 days inputs and predict the output. So to answer your question You will have a common label for previous 7 days input.

I strongly recommend you to study a bit more on LSTM's.