I am working on NLP LSTM named entity extraction model but running into different errors below are more details about error. I am running this code in jupiter notebook
Tensorflow version 2.9
Both input and output are of length 50
input sentence : [123 88 170 221 132 52 105 32 211 91 126 211 24 221 134 154 221 162 215 80 144 101 61 136 68 133 40 200 133 40 218 131 139 199 124 74 184 92 213 185 221 221 221 221 221 221 221 221 221 221]
output sentece label: [ 7 7 7 7 0 7 6 2 7 5 1 7 7 7 7 7 7 7 7 10 7 7 7 7 3 8 7 3 8 7 7 7 7 7 7 7 7 6 2 7 7 7 7 7 7 7 7 7 7 7]
Added upto 5 layers to train the model
Here is the model:
model = tf.keras.Sequential([
tf.keras.layers.Embedding(num_words, 50, input_length=50),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(64, activation=‘relu’),
tf.keras.layers.Dense(num_tags, activation=‘softmax’)
])
If I use loss function as “categorical_crossentropy” , I get this error: ValueError: Shapes (None, 50) and (None, 11) are incompatible
If I use loss function as “sparse_categorical_crossentropy” , I get this error: logits and labels must have the same first dimension, got logits shape [13,11] and labels shape [650] [[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
I tried adding input shape as first layer but still no luck tf.keras.layers.Input(shape=(max_len,))
Can anyone help , how to solve this. Tried different approaches but no luck
Here is model summary
Layer (type) Output Shape Param #
=================================================================
embedding_18 (Embedding) (None, 50, 50) 11100
bidirectional_35 (Bidirecti (None, 50, 128) 58880
onal)
bidirectional_36 (Bidirecti (None, 64) 41216
onal)
dropout_17 (Dropout) (None, 64) 0
dense_35 (Dense) (None, 64) 4160
dense_36 (Dense) (None, 11) 715
=================================================================
Total params: 116,071
Trainable params: 116,071
Non-trainable params: 0
_________________________________________________________________
I think you have a problem in 2 last dense layers. When run on a sequence of 50 numbers, you will get 'num_tags' numbers as output (11).
But you want to get 'num_tags' outputs at each step of the sequence, not at the end. To achieve this, you can use TimeDistributed layer:
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(64, activation=‘relu’)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_tags, activation=‘softmax’))
Then you can use “sparse_categorical_crossentropy” loss function since your labels are ints.
Please see as example: https://towardsdatascience.com/named-entity-recognition-ner-using-keras-bidirectional-lstm-28cd3f301f54