I am trying to create a basic encoder-decoder model for training a chatbot. X contains the questions or human dialogues and Y contains the bot answers. I have padded the sequences to the max size of input and output sentences. X.shape = (2363, 242, 1) and Y.shape = (2363, 144, 1). But during training, the loss has value 'nan' for all epochs and the prediction gives array with all values as 'nan'. I have tried using 'rmsprop' optimizer instead of 'adam'. I cannot use loss function 'categorical_crossentropy' as the output is not one-hot encoded but a sequence. What exactly is wrong with my code?
model = Sequential()
model.add(LSTM(units=64, activation='relu', input_shape=(X.shape[1], 1)))
model.add(LSTM(units=64, activation='relu', return_sequences=True))
model.compile(optimizer='adam', loss='mean_squared_error')
hist = model.fit(X, Y, epochs=20, batch_size=64, verbose=2)
Data Preparation
def remove_punctuation(s):
s = s.translate(str.maketrans('','',string.punctuation))
s = s.encode('ascii', 'ignore').decode('ascii')
return s
def prepare_data(fname):
word2idx = {'PAD': 0}
curr_idx = 1
sents = list()
for line in open(fname):
line = line.strip()
if line:
tokens = remove_punctuation(line.lower()).split()
tmp = []
for t in tokens:
if t not in word2idx:
word2idx[t] = curr_idx
curr_idx += 1
sents = np.array(pad_sequences(sents, padding='post'))
return sents, word2idx
human = 'rdany-conversations/human_text.txt'
robot = 'rdany-conversations/robot_text.txt'
X, input_vocab = prepare_data(human)
Y, output_vocab = prepare_data(robot)
X = X.reshape((X.shape[0], X.shape[1], 1))
Y = Y.reshape((Y.shape[0], Y.shape[1], 1))
First of all check that you do not have any NaNs in your input. If this is not the case it might be exploding gradients. Standardize your inputs (MinMax- or Z-scaling), try smaller learning rates, clip the gradients, try a different weight initialization scheme.