Input Shape Error Adding Embedding Layers to LSTM

I'm trying to add an embedding layer to my LSTM that predicts characters.

I've tried adding an embedding layer in this format,

num_words_in_vocab = 83
max_sentence_length = 40

# build the model: a single LSTM
model = Sequential()
model.add(LSTM(256, return_sequences=True))

However, keras throws this error

Error when checking input: expected embedding_8_input to have 2 dimensions, but got array with shape (36736, 40, 83)

I'm confused because there is no place in the embedding layer to set a variable for the number of examples in the dataset. And I'm not sure how to reshape this dataset to make it work with the embedding layer.

Here is my full code.

# -*- coding: utf-8 -*-
import re
import sys
import numpy
import random
import requests
import numpy as np
import keras.backend as K
from keras import Input, Model
from keras.layers import Permute, multiply, Embedding
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils
from keras.models import Sequential
from keras.optimizers import RMSprop
from keras.callbacks import ModelCheckpoint
from sklearn.model_selection import train_test_split

#loading book data
html = requests.get("")
text = html.text
#removing some garbage
text = re.sub(r'[^\x00-\x7f]',r'', text)

#making the word plot, but not using it to train bc 57 chars is better than X,xxx words.
split_text = text.splitlines()

def cleanText(text):
  cleanWords = []
  for exerpt in text:
    if exerpt == '':
  #take the clean words and make a LIST of clean words
  clean_word_list = []
  for exerpt in cleanWords:
    temp_list = exerpt.split()
    for word in temp_list:
      if word not in clean_word_list:
  #init dict for counting top 50 words
  dict_prevelence = {}
  for exerpt in cleanWords:
    temp_list = exerpt.split()
    for word in temp_list:
      #if not in dict, add to dict_prevelence, else, increment val
      if word not in dict_prevelence:
        dict_prevelence[word] = 1
        dict_prevelence[word] += 1
  return clean_word_list, dict_prevelence

#cleaning up the alice in wonderland and getting unsorted prevelence dict
clean_word_list, dict_prevelence = cleanText(split_text)
#sorting dict
dict_prevelence = sorted(dict_prevelence.items(), key=lambda x: x[1], reverse=True)

processed_text = text

#getting list of unique chars
chars = sorted(list(set(processed_text)))
print('Total Unique Chars:', len(chars))
#making dicts so we can translate between the two
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

#cutting the text into strings of 100 chars, but incrementing them by 3 chars 
#each time b/c if we incremented by 1, 99% of the string would be the same and 
#it wouldn't train that fast.

#!!! I'm guessing this is knind of a good middle ground between using words and chars and the data,
#with words you get a lot more context from each, but with letters there isn't a huge overhead of empty 
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(processed_text) - maxlen, step):
    sentences.append(processed_text[i: i + maxlen])
    next_chars.append(processed_text[i + maxlen])

#here we're making the empty data vectors
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
#now we add each 'sentence' that overlaps by 3 as a data, after encoding it.
#so each x data entry is a 100 int number that corresponds to a slightly overlapping sentence I guess
#and each y data entry would be the NEXT char in that sentence if it continued.
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

#add a thing here for test train split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33, shuffle=False)

print('X_train Data Shape:', X_train.shape)
print('y_train Data Shape:', y_train.shape)

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

#putting in this dope thing called callbacks so we can save weights in case we die during training like we have been.
# checkpoint
checkpoint = ModelCheckpoint(filepath,  verbose=1, save_best_only=True, mode='max')

#TRAIN, THAT, MODEL!!, y_train, validation_data=(X_test, y_test),epochs=25, batch_size=64,verbose=1)

Any help would be great!


  • In regards to the number of samples, Keras automatically infers that from the input data shape: X_train, in this case.

    In terms of the use of the embedding layer, the idea is to convert a matrix of integers into a vector. In your case, it seems like you might be essentially doing that already in the step where you populate "x". You might, instead, want to consider letting the embedding layer compute a vector for each index. To do this, I believe you would modify "x" to be of shape (num_of_sentences, num_of_chars_per_sentence), where the value at each datapoint is the char index for that particular character.

    Also, you might want to set the LSTM return_sequences to "False". I believe you are only looking for the final result from that layer.

    I hope this helps.