I have some confusions regarding to Tensorflow input_shape.
Suppose there are 3 documents (each row) in "doc" defined below, and the vocabulary has 4 words (each sublist in each row).
Further suppose that each word is represented by 2 numbers via word embedding.
The program only works when I specify input_shape=(3,4,2) under a Dense layer. But when I use a LSTM layer, the program only works when input_shape=(4,2) but not when input_shape=(3,4,2).
So how to specify the input shape for such inputs? How to make sense of it?
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
doc=[
[[1,0],[0,0],[0,0],[0,0]],
[[0,0],[1,0],[0,0],[0,0]],
[[0,0],[0,0],[1,0],[0,0]]
]
model=Sequential()
model.add(Dense(2,input_shape=(3,4,2))) # model.add(LSTM(2,input_shape=(4,2)))
model.compile(optimizer=Adam(learning_rate=0.0001),loss="sparse_categorical_crossentropy",metrics=("accuracy"))
model.summary()
output=model.predict(doc)
print(model.weights)
print(output)
The input_shape
argument in a keras.layers.LTSM
layer expects a 2D array with a shape of [timesteps, features]
. Your doc
has the shape [batch_size, timesteps, features]
and therefore one dimension too much.
You can use the batch_input_shape
argument instead, if you want feed batch_size
, too.
To do so, you have just to replace this line of your code:
model.add(LSTM(2,input_shape=(4,2)))
With this one:
model.add(LSTM(2,batch_input_shape=(3,4,2)))
If you're setting a specific batch_size
in your model and then feed a different size other than 3 (in your case), you will get an error. Using input_shape
instead you have the flexibility to feed any batch size to the network.