I get the next error when im trying to fine tuning a bert model to predict sentiment analysis.
Im using as input: X-A list of strings that contains tweets y-a numeric list (0 - negative, 1 - positive)
I am trying to fine tuning a bert model to predict sentiment analysis but i always get the same error in logits and labels when im trying to fit the model. I load a pretrained model and then build the dataset but when i am trying to fit it, it is impossible.
The text used as input is a list of strings made of tweets and the labels used as input are a list of categories (negative and positive) but transformed to 0 and 1.
from sklearn.preprocessing import MultiLabelBinarizer
#LOAD MODEL
hugging_face_model = 'distilbert-base-uncased-finetuned-sst-2-english'
batches = 32
epochs = 1
tokenizer = BertTokenizer.from_pretrained(hugging_face_model)
model = TFBertModel.from_pretrained(hugging_face_model, num_labels=2)
#PREPARE THE DATASET
#create a list of strings (tweets)
lst = list(X_train_lower['lower_text'].values)
encoded_input = tokenizer(lst, truncation=True, padding=True, return_tensors='tf')
y_train['sentimentNumber'] = y_train['sentiment'].replace({'negative': 0, 'positive': 1})
label_list = list(y_train['sentimentNumber'].values)
#CREATE DATASET
train_dataset = tf.data.Dataset.from_tensor_slices((dict(encoded_input), label_list))
#COMPILE AND FIT THE MODEL
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5), loss=BinaryCrossentropy(from_logits=True),metrics=["accuracy"])
model.fit(train_dataset.shuffle(len(df)).batch(batches),epochs=epochs,batch_size=batches) ```
ValueError Traceback (most recent call last)
<ipython-input-158-e5b63f982311> in <module>()
----> 1 model.fit(train_dataset.shuffle(len(df)).batch(batches),epochs=epochs,batch_size=batches)
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py in autograph_handler(*args, **kwargs)
1145 except Exception as e: # pylint:disable=broad-except
1146 if hasattr(e, "ag_error_metadata"):
-> 1147 raise e.ag_error_metadata.to_exception(e)
1148 else:
1149 raise
ValueError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1000, in train_step
loss = self.compiled_loss(y, y_pred, sample_weight, regularization_losses=self.losses)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/compile_utils.py", line 201, in __call__
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 141, in __call__
losses = call_fn(y_true, y_pred)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 245, in call **
return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/losses.py", line 1932, in binary_crossentropy
backend.binary_crossentropy(y_true, y_pred, from_logits=from_logits),
File "/usr/local/lib/python3.7/dist-packages/keras/backend.py", line 5247, in binary_crossentropy
return tf.nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)
ValueError: `logits` and `labels` must have the same shape, received ((None, 512, 768) vs (None, 1)).
As described in this kaggle notebook, you must build a custom Keras Model around the pre-trained BERT model to perform classification,
The bare Bert Model transformer outputing raw hidden-states without any specific head on top
Here is a copy of a piece of code:
def create_model(bert_model):
input_ids = tf.keras.Input(shape=(60,),dtype='int32')
attention_masks = tf.keras.Input(shape=(60,),dtype='int32')
output = bert_model([input_ids,attention_masks])
output = output[1]
output = tf.keras.layers.Dense(32,activation='relu')(output)
output = tf.keras.layers.Dropout(0.2)(output)
output = tf.keras.layers.Dense(1,activation='sigmoid')(output)
model = tf.keras.models.Model(inputs = [input_ids,attention_masks],outputs = output)
model.compile(Adam(lr=6e-6), loss='binary_crossentropy', metrics=['accuracy'])
return model
Note: you might have to adapt this code and in particular modify the Input shape (60 to 512 seemingly from the error message, your tokenizer maximum length)
Load BERT model and build the classifier :
from transformers import TFBertModel
bert_model = TFBertModel.from_pretrained(hugging_face_model)
model = create_model(bert_model)
model.summary()
Summary:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 60)] 0 []
input_2 (InputLayer) [(None, 60)] 0 []
tf_bert_model_1 (TFBertModel) TFBaseModelOutputWi 109482240 ['input_1[0][0]',
thPoolingAndCrossAt 'input_2[0][0]']
tentions(last_hidde
n_state=(None, 60,
768),
pooler_output=(Non
e, 768),
past_key_values=No
ne, hidden_states=N
one, attentions=Non
e, cross_attentions
=None)
dense (Dense) (None, 32) 24608 ['tf_bert_model_1[0][1]']
dropout_74 (Dropout) (None, 32) 0 ['dense[0][0]']
dense_1 (Dense) (None, 1) 33 ['dropout_74[0][0]']
==================================================================================================
Total params: 109,506,881
Trainable params: 109,506,881
Non-trainable params: 0