tensorflow machine-learning keras sentiment-analysis bert-language-model

`logits` and `labels` must have the same shape, received ((None, 512, 768) vs (None, 1)) when using transformers

I get the next error when im trying to fine tuning a bert model to predict sentiment analysis.

Im using as input: X-A list of strings that contains tweets y-a numeric list (0 - negative, 1 - positive)

I am trying to fine tuning a bert model to predict sentiment analysis but i always get the same error in logits and labels when im trying to fit the model. I load a pretrained model and then build the dataset but when i am trying to fit it, it is impossible.

The text used as input is a list of strings made of tweets and the labels used as input are a list of categories (negative and positive) but transformed to 0 and 1.

from sklearn.preprocessing import MultiLabelBinarizer

#LOAD MODEL

hugging_face_model = 'distilbert-base-uncased-finetuned-sst-2-english'
batches = 32
epochs = 1 

tokenizer = BertTokenizer.from_pretrained(hugging_face_model)
model = TFBertModel.from_pretrained(hugging_face_model, num_labels=2)

#PREPARE THE DATASET

#create a list of strings (tweets)


lst = list(X_train_lower['lower_text'].values) 
encoded_input  = tokenizer(lst, truncation=True, padding=True, return_tensors='tf')

y_train['sentimentNumber'] = y_train['sentiment'].replace({'negative': 0, 'positive': 1})
label_list = list(y_train['sentimentNumber'].values) 

#CREATE DATASET

train_dataset = tf.data.Dataset.from_tensor_slices((dict(encoded_input), label_list))

#COMPILE AND FIT THE MODEL

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5), loss=BinaryCrossentropy(from_logits=True),metrics=["accuracy"])
model.fit(train_dataset.shuffle(len(df)).batch(batches),epochs=epochs,batch_size=batches) ```




ValueError                                Traceback (most recent call last)
<ipython-input-158-e5b63f982311> in <module>()
----> 1 model.fit(train_dataset.shuffle(len(df)).batch(batches),epochs=epochs,batch_size=batches)

1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
   1145           except Exception as e:  # pylint:disable=broad-except
   1146             if hasattr(e, "ag_error_metadata"):
-> 1147               raise e.ag_error_metadata.to_exception(e)
   1148             else:
   1149               raise

ValueError: in user code:

    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1000, in train_step
        loss = self.compiled_loss(y, y_pred, sample_weight, regularization_losses=self.losses)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 201, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 141, in __call__
        losses = call_fn(y_true, y_pred)
    File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 245, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 1932, in binary_crossentropy
        backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
    File "/usr/local/lib/python3.7/dist-packages/keras/backend.py", line 5247, in binary_crossentropy
        return tf.nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)

    ValueError: `logits` and `labels` must have the same shape, received ((None, 512, 768) vs (None, 1)).

Solution

As described in this kaggle notebook, you must build a custom Keras Model around the pre-trained BERT model to perform classification,

The bare Bert Model transformer outputing raw hidden-states without any specific head on top

Here is a copy of a piece of code:

def create_model(bert_model):
  input_ids = tf.keras.Input(shape=(60,),dtype='int32')
  attention_masks = tf.keras.Input(shape=(60,),dtype='int32')
  
  output = bert_model([input_ids,attention_masks])
  output = output[1]
  output = tf.keras.layers.Dense(32,activation='relu')(output)
  output = tf.keras.layers.Dropout(0.2)(output)

  output = tf.keras.layers.Dense(1,activation='sigmoid')(output)
  model = tf.keras.models.Model(inputs = [input_ids,attention_masks],outputs = output)
  model.compile(Adam(lr=6e-6), loss='binary_crossentropy', metrics=['accuracy'])
  return model

Note: you might have to adapt this code and in particular modify the Input shape (60 to 512 seemingly from the error message, your tokenizer maximum length)

Load BERT model and build the classifier :

from transformers import TFBertModel
bert_model = TFBertModel.from_pretrained(hugging_face_model)
model = create_model(bert_model)
model.summary()

Summary:

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 60)]         0           []                               
                                                                                                  
 input_2 (InputLayer)           [(None, 60)]         0           []                               
                                                                                                  
 tf_bert_model_1 (TFBertModel)  TFBaseModelOutputWi  109482240   ['input_1[0][0]',                
                                thPoolingAndCrossAt               'input_2[0][0]']                
                                tentions(last_hidde                                               
                                n_state=(None, 60,                                                
                                768),                                                             
                                 pooler_output=(Non                                               
                                e, 768),                                                          
                                 past_key_values=No                                               
                                ne, hidden_states=N                                               
                                one, attentions=Non                                               
                                e, cross_attentions                                               
                                =None)                                                            
                                                                                                  
 dense (Dense)                  (None, 32)           24608       ['tf_bert_model_1[0][1]']        
                                                                                                  
 dropout_74 (Dropout)           (None, 32)           0           ['dense[0][0]']                  
                                                                                                  
 dense_1 (Dense)                (None, 1)            33          ['dropout_74[0][0]']             
                                                                                                  
==================================================================================================
Total params: 109,506,881
Trainable params: 109,506,881
Non-trainable params: 0