I've built an LSTM model (see below) and trained it. My loss function is binary cross entropy as I'm doing binary classification. The training y data is a set of 0's and 1's.
When I run model.predict(x_test_scaled)
I get a set single series with values ranging between 0 and 1. I'm guessing this is a probability, but is it the probability that the output = 0 or that the output = 1?
model = Sequential()
model.add(LSTM(units=512, input_shape = (X_train.shape[1],X_train.shape[2]), return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(units=512, return_sequences=False))
model.add(Dropout(0.3))
model.add(Dense(264), activation = 'tanh')
model.add(Dense(1))
I'm quite rusty with ANNs, but maybe I can help.
Model.predict
passes the input vector through the model and returns the output tensor for each datapoint.
Since the last layer in your model is a single Dense neuron, the output for any datapoint is a single value. And since you didn't specify an activation for the last layer, it will default to linear activation.
Because you're solving a classification problem, you probably need sigmoid
activation. If memory doesn't betray me, I believe you can get-by by solving it as a multi-class classification with 2 classes (essentially 2 output neurons with softmax activation). Essentially, line activation is not suited for classification problems.
MORE NOTES: understanding your model
If you want to understand the output you're getting, you have to understand the task you're training the model to solve.
With each data-point, you tell the model the output for this is 1
or the output for this is 0
. But it looks at the output not as a class for the input, but rather as a value from the range [0,1]. and it's being trained to emit values in this range.
Have a look at the training y
tensor. Let's assume an output of 0
matches the first class, and an output of 1
matches the second class.
In this case, the more certain your model is about the input, the farther from 0.5 its output will be.
So, a value of 0.1
means your model is sort-of certain it belongs to class 1
(closer to output value 0
).
If the output is 0.9999
then it thinks the input belongs to class 2
very high certainty (output is very close to 1).
If the output, on the other hand, is something like 0.45
(very close to 0.5
), then the model thinks "maybe the input belongs to class 1, but I'm totally not sure about it.
Hope this helped a little, I'll also upvote any answer that's more accurate.