Search code examples
pythontensorflowkerastime-serieslstm

LSTM Keras - Many to many classification Value error: incompatible shapes


I have just started with implementing a LSTM in Python with Tensorflow / Keras to test out an idea I had, however I am struggling to properly create a model. This post is mainly about a Value error that I often get (see the code at the bottom), but any and all help with creating a proper LSTM model for the problem below is greatly appreciated.

For each day, I want to predict which of a group of events will occur. The idea is that some events are recurring / always occur after a certain amount of time has passed, whereas other events occur only rarely or without any structure. A LSTM should be able to pick up on these recurring events, in order to predict their occurences for days in the future.

In order to display the events, I use a list with values 0 and 1 (non-occurence and occurence). So for example if I have the events ["Going to school", "Going to the gym" , "Buying a computer"] I have lists like [1, 0, 1], [1, 1, 0], [1, 0, 1], [1, 1, 0] etc. The idea is then that the LSTM will recognize that I go to school every day, the gym every other day and that buying a computer is very rare. So following the sequence of vectors, for the next day it should predict [1,0,0].

So far I have done the following:

  1. Create x_train: a numpy.array with shape (305, 60, 193). Each entry of x_train contains 60 consecutive days, where day is represented by a vector of the same 193 events that can take place like described above.
  2. Create y_train: a numpy.array with shape (305, 1, 193). Similar to x_train, but y_train only contains 1 day per entry.

x_train[0] consists of day 1,2,...,60 and y_train[0] contains day 61. x_train[1] then contains day 2,...,61 and y_train[1] contains day 62, etc. The idea is that the LSTM should learn to use data from the past 60 days, and that it can then iteratively start predicting/generating new vectors of event occurences for future days.

I am really struggling with how to create a simple implementation of a LSTM that can handle this. So far I think I have figured out the following:

  1. I need to start with the below block of code, where N_INPUTS = 60 and N_FEATURES = 193. I am not sure what N_BLOCKS should be, or if the value it should take is strictly bound by some conditions. EDIT: According to https://zhuanlan.zhihu.com/p/58854907 it can be whatever I want
model = Sequential()
model.add(LSTM(N_BLOCKS, input_shape=(N_INPUTS, N_FEATURES)))
  1. I should probably add a dense layer. If I want the output of my LSTM to be a vector with the 193 events, this should look as follows:
model.add(layers.Dense(193,activation = 'linear') #or some other activation function
  1. I can also add a dropout layer to prevent overfitting, for example with model.add.layers.dropout(0.2) where the 0.2 is some rate at which things are set to 0.
  2. I need to add a model.compile(loss = ..., optimizer = ...). I am not sure if the loss function (e.g. MSE or categorical_crosstentropy) and optimizer matter if I just want a working implementation.
  3. I need to train my model, which I can achieve by using model.fit(x_train,y_train)
  4. If all of the above works well, I can start to predict values for the next day using model.predict(the 60 days before the day I want to predict)

One of my attempts can be seen here:

print(x_train.shape)
print(y_train.shape)

model = keras.Sequential()
model.add(layers.LSTM(256, input_shape=(x_train.shape[1], x_train.shape[2])))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(y_train.shape[2], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
model.fit(x_train,y_train) #<- This line causes the ValueError

Output:
(305, 60, 193)
(305, 1, 193)
Model: "sequential_29"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_27 (LSTM)              (None, 256)               460800    
                                                                 
 dense_9 (Dense)             (None, 1)                 257       
                                                                 
=================================================================
Total params: 461,057
Trainable params: 461,057
Non-trainable params: 0
_________________________________________________________________
ValueError: Shapes (None, 1, 193) and (None, 193) are incompatible 

Alternatively, I have tried replacing the line model.add(layers.Dense(y_train.shape[2], activation='softmax')) with model.add(layers.Dense(y_train.shape[1], activation='softmax')). This produces ValueError: Shapes (None, 1, 193) and (None, 1) are incompatible .

Are my ideas somewhat okay? How can I resolve this Value Error? Any help would be greatly appreciated.

EDIT: As suggested in the comments, changing the size of y_train did the trick.

print(x_train.shape)
print(y_train.shape)

model = keras.Sequential()
model.add(layers.LSTM(193, input_shape=(x_train.shape[1], x_train.shape[2]))) #De 193 mag ieder mogelijk getal zijn. zie: https://zhuanlan.zhihu.com/p/58854907
model.add(layers.Dropout(0.2))
model.add(layers.Dense(y_train.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()
model.fit(x_train,y_train)


(305, 60, 193)
(305, 193)
Model: "sequential_40"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_38 (LSTM)              (None, 193)               298764    
                                                                 
 dropout_17 (Dropout)        (None, 193)               0         
                                                                 
 dense_16 (Dense)            (None, 193)               37442     
                                                                 
=================================================================
Total params: 336,206
Trainable params: 336,206
Non-trainable params: 0
_________________________________________________________________
10/10 [==============================] - 3s 89ms/step - loss: 595.5011

Now I am stuck on the fact that model.predict(x) requires x to be of the same size as x_train, and will output an array with the same size as y_train. I was hoping only one set of 60 days would be required to output the 61th day. Does anyone know how to achieve this?


Solution

  • The solution may be to have y_train of shape (305, 193) instead of (305, 1, 193) as you predict one day, this does not change the data, just its shape. You should then be able to train and predict. With model.add(layers.Dense(y_train.shape[1], activation='softmax')) of course.