Search code examples
python-3.xkerasattention-modelsummarizationencoder-decoder

explain model.fit in LSTM encoder-decoder with Attention model for Text Summarization using Keras /Tensorflow


In deep learning using Keras I have usually come across model.fit as something like this:

model.fit(x_train, y_train, epochs=50, callbacks=[es], batch_size=512, validation_data=(x_val, y_val)

Whereas in NLP taks, I have seen some articles on Text summarization using LSTM encoder-decoder with Attention model and I usually come across this code for fitting the model which I'm not able to comprehend:

model.fit([x_tr,y_tr[:,:-1]], y_tr.reshape(y_tr.shape[0],y_tr.shape[1], 1)[:,1:] ,epochs=50,callbacks=[es],batch_size=512, validation_data=([x_val,y_val[:,:-1]], y_val.reshape(y_val.shape[0],y_val.shape[1], 1)[:,1:]))

And I have found no explanation to why it is being done so. Can someone provide an explanation to the above code. The above code is found at https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/

Please note: I have contacted the person who wrote the article but no response from him.


Solution

  • Just saw your question. Anyway, if any one has the similar question, here is an explanation.

    model.fit() method to fit the training data where you can define the batch size to be e.g. 512 in your case. Send the text and summary (excluding the last word in summary) as the input, and a reshaped summary tensor comprising every word (starting from the second word) as the output (which explains the infusion of intelligence into the model to predict a word, given the previous word). Besides, to enable validation during the training phase, send the validation data as well.