keras nlp lstm tensorflow2.0 named-entity-recognition

LSTM named entity recognition model - shape are incompatible or logits/labels have different dimensions - Tensorflow 2.9

I am working on NLP LSTM named entity extraction model but running into different errors below are more details about error. I am running this code in jupiter notebook

Tensorflow version 2.9

Both input and output are of length 50

input sentence : [123 88 170 221 132 52 105 32 211 91 126 211 24 221 134 154 221 162 215 80 144 101 61 136 68 133 40 200 133 40 218 131 139 199 124 74 184 92 213 185 221 221 221 221 221 221 221 221 221 221]

output sentece label: [ 7 7 7 7 0 7 6 2 7 5 1 7 7 7 7 7 7 7 7 10 7 7 7 7 3 8 7 3 8 7 7 7 7 7 7 7 7 6 2 7 7 7 7 7 7 7 7 7 7 7]

Added upto 5 layers to train the model

Here is the model:

model = tf.keras.Sequential([

tf.keras.layers.Embedding(num_words, 50, input_length=50),

tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),

tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),

tf.keras.layers.Dropout(0.5),

tf.keras.layers.Dense(64, activation=‘relu’),

tf.keras.layers.Dense(num_tags, activation=‘softmax’)
])

If I use loss function as “categorical_crossentropy” , I get this error: ValueError: Shapes (None, 50) and (None, 11) are incompatible

If I use loss function as “sparse_categorical_crossentropy” , I get this error: logits and labels must have the same first dimension, got logits shape [13,11] and labels shape [650] [[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

I tried adding input shape as first layer but still no luck tf.keras.layers.Input(shape=(max_len,))

Can anyone help , how to solve this. Tried different approaches but no luck

Here is model summary

Layer (type)                Output Shape              Param #   
=================================================================
 embedding_18 (Embedding)    (None, 50, 50)            11100     
                                                                 
 bidirectional_35 (Bidirecti  (None, 50, 128)          58880     
 onal)                                                           
                                                                 
 bidirectional_36 (Bidirecti  (None, 64)               41216     
 onal)                                                           
                                                                 
 dropout_17 (Dropout)        (None, 64)                0         
                                                                 
 dense_35 (Dense)            (None, 64)                4160      
                                                                 
 dense_36 (Dense)            (None, 11)                715       
                                                                 
=================================================================
Total params: 116,071
Trainable params: 116,071
Non-trainable params: 0
_________________________________________________________________

Solution

I think you have a problem in 2 last dense layers. When run on a sequence of 50 numbers, you will get 'num_tags' numbers as output (11).

But you want to get 'num_tags' outputs at each step of the sequence, not at the end. To achieve this, you can use TimeDistributed layer:

tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(64, activation=‘relu’)),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(num_tags, activation=‘softmax’))

Then you can use “sparse_categorical_crossentropy” loss function since your labels are ints.

Please see as example: https://towardsdatascience.com/named-entity-recognition-ner-using-keras-bidirectional-lstm-28cd3f301f54