Search code examples
pythontensorflowkerasneural-networklstm

how to feed LSTM model in Keras python?


I have read about LSTM and I know that algorithm takes the value of the previous words and consider it in the next word parameters

Now I am trying to apply my first LSTM algorithm

I have this code.

model = Sequential()
model.add(LSTM(units=6, input_shape = (X_train_count.shape[0], X_train_count.shape[1]), return_sequences = True))
model.add(LSTM(units=6, return_sequences=True))
model.add(LSTM(units=6, return_sequences=True))
model.add(LSTM(units=ytrain.shape[1], return_sequences=True, name='output'))
model.compile(loss='cosine_proximity', optimizer='sgd', metrics = ['accuracy'])



model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['acc'])
model.summary()

cp=ModelCheckpoint('model_cnn.hdf5',monitor='val_acc',verbose=1,save_best_only=True)

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['acc'])
model.summary()

cp=ModelCheckpoint('model_cnn.hdf5',monitor='val_acc',verbose=1,save_best_only=True)


history = model.fit(X_train_count, ytrain,
                    epochs=20,
                    verbose=False,
                    validation_data=(X_test_count, yval),
                    batch_size=10,
                    callbacks=[cp])

1- I cannot see how the LSTM would know the word sequence while my dataset built based on TFIDF?

2- I am getting error that

ValueError: Input 0 of layer sequential_8 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 18644]

Solution

  • The issue seems to be in the shape of X_train_count you are taking in LSTM input shape is always tricky.

    If your X_train_count is not in 3D then reshape using the below line.

    X_train_count=X_train_count.reshape(X_train_count.shape[0],X_train_count.shape[1],1))
    

    In the LSTM layer, the input_shape should be (timesteps, data_dim).

    Below is the example to illustrate the same.

    from sklearn.feature_extraction.text import TfidfVectorizer
    import tensorflow as tf
    from tensorflow import keras
    from sklearn.model_selection import train_test_split
    
    X = ["first example","one more","good morning"]
    Y = ["first example","one more","good morning"]
    
    vectorizer = TfidfVectorizer().fit(X)
    
    tfidf_vector_X = vectorizer.transform(X).toarray() 
    tfidf_vector_Y = vectorizer.transform(Y).toarray() 
    tfidf_vector_X = tfidf_vector_X[:, :, None] 
    tfidf_vector_Y = tfidf_vector_Y[:, :, None] 
    
    X_train, X_test, y_train, y_test = train_test_split(tfidf_vector_X, tfidf_vector_Y, test_size = 0.2, random_state = 1)
    
    from tensorflow.keras import Sequential
    from tensorflow.keras.layers import LSTM
    
    model = Sequential()
    model.add(LSTM(units=6, input_shape = X_train.shape[1:], return_sequences = True))
    model.add(LSTM(units=6, return_sequences=True))
    model.add(LSTM(units=6, return_sequences=True))
    model.add(LSTM(units=1, return_sequences=True, name='output'))
    model.compile(loss='cosine_proximity', optimizer='sgd', metrics = ['accuracy'])
    

    Model Summary:

    Model: "sequential_3"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    lstm_9 (LSTM)                (None, 6, 6)              192       
    _________________________________________________________________
    lstm_10 (LSTM)               (None, 6, 6)              312       
    _________________________________________________________________
    lstm_11 (LSTM)               (None, 6, 6)              312       
    _________________________________________________________________
    output (LSTM)                (None, 6, 1)              32        
    =================================================================
    Total params: 848
    Trainable params: 848
    Non-trainable params: 0
    _________________________________________________________________
    None  
    

    Here X_train is of shape (2, 6, 1)

    To add to the solution, I would like to suggest to go with a dense vector instead of a sparse vector generated from the Tf-Idf approach representation by replacing with pre-trained models like Google News Vector or Glove as weights to the embedding layer which would be better in performance wise and result wise.