python-3.x tensorflow tf.keras named-entity-recognition crf

using tfa.layers.crf on top of biLSTM

I am trying to implement NER model based on CRF with tensorflow-addons library. The model gets sequence of words in word to index and char level format and the concatenates them and feeds them to the BiLSTM layer. Here is the code of implementation:

import tensorflow as tf
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import LSTM, Embedding, Dense, TimeDistributed, Dropout, Conv1D
from tensorflow.keras.layers import Bidirectional, concatenate, SpatialDropout1D, GlobalMaxPooling1D
from tensorflow_addons.layers import CRF

word_input = Input(shape=(max_sent_len,))
word_emb = Embedding(input_dim=n_words + 2, output_dim=dim_word_emb,
                     input_length=max_sent_len, mask_zero=True)(word_input)

char_input = Input(shape=(max_sent_len, max_word_len,))
char_emb = TimeDistributed(Embedding(input_dim=n_chars + 2, output_dim=dim_char_emb,
                           input_length=max_word_len, mask_zero=True))(char_input)

char_emb = TimeDistributed(LSTM(units=20, return_sequences=False,
                                recurrent_dropout=0.5))(char_emb)

# main LSTM
main_input = concatenate([word_emb, char_emb])
main_input = SpatialDropout1D(0.3)(main_input)
main_lstm = Bidirectional(LSTM(units=50, return_sequences=True,
                               recurrent_dropout=0.6))(main_input)
kernel = TimeDistributed(Dense(50, activation="relu"))(main_lstm)  
crf = CRF(n_tags+1)  # CRF layer
decoded_sequence, potentials, sequence_length, chain_kernel = crf(kernel)  # output

model = Model([word_input, char_input], potentials)
model.add_loss(tf.abs(tf.reduce_mean(kernel)))
model.compile(optimizer="rmsprop", loss='categorical_crossentropy')

When I start to fit the model, I get this Warnings:

WARNING:tensorflow:Gradients do not exist for variables ['chain_kernel:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['chain_kernel:0'] when minimizing the loss.

And training process goes like this:

438/438 [==============================] - 80s 163ms/step - loss: nan - val_loss: nan
Epoch 2/10
438/438 [==============================] - 71s 163ms/step - loss: nan - val_loss: nan
Epoch 3/10
438/438 [==============================] - 71s 162ms/step - loss: nan - val_loss: nan
Epoch 4/10
438/438 [==============================] - 71s 161ms/step - loss: nan - val_loss: nan
Epoch 5/10
438/438 [==============================] - 71s 162ms/step - loss: nan - val_loss: nan
Epoch 6/10
438/438 [==============================] - 70s 160ms/step - loss: nan - val_loss: nan
Epoch 7/10
438/438 [==============================] - 70s 161ms/step - loss: nan - val_loss: nan
Epoch 8/10
438/438 [==============================] - 70s 160ms/step - loss: nan - val_loss: nan
Epoch 9/10
438/438 [==============================] - 71s 161ms/step - loss: nan - val_loss: nan
Epoch 10/10
438/438 [==============================] - 70s 159ms/step - loss: nan - val_loss: nan

I am almost sure that the problem is with the way I am setting the loss functions, but I do not know how exactly I should set them. I also searched for my problem but I did not get any answer. Also When I test my model, it can not predict labels correct and gives them same labels. Can anybody describe for me that how should I solve this problem?

Solution

Ghange your loss function to tensorflow_addons.losses.SigmoidFocalCrossEntropy().I guess categorical crossentropy is not a good choice.