tensorflow google-colaboratory transformer-model

the evaluation section leds to bug

there is a problem in running code hi,

pop out an error and can not generate the result, Error: ValueError: Input 2 is incompatible with layer model_2: expected shape=(None, 50), found shape=(None, 51)

so is there any solution for? Much obliged

the full part that triggers the bug droped below full error:

ValueError: in user code:

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:1478 predict_function  *
    return step_function(self, iterator)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:1468 step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/tpu_strategy.py:540 run
    return self.extended.tpu_run(fn, args, kwargs, options)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/tpu_strategy.py:1296 tpu_run
    return func(args, kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/distribute/tpu_strategy.py:1364 tpu_function
    xla_options=tpu.XLAOptions(use_spmd_for_xla_partitioning=False))
/opt/conda/lib/python3.7/site-packages/tensorflow/python/tpu/tpu.py:968 replicate
    xla_options=xla_options)[1]
/opt/conda/lib/python3.7/site-packages/tensorflow/python/tpu/tpu.py:1439

Solution

It looks like during training it's is dropping last elements of train_data['decoder_inputs_ids'] and train_data['decoder_attention_mask'] while during prediction, it's not.

model.fit(x=[train_data['input_ids'],
             train_data['attention_mask'],
             train_data['decoder_inputs_ids'][:,:-1],
             train_data['decoder_attention_mask'][:,:-1]],

pred = model.predict([input_ids, attention_mask, decoder_inputs_ids, decoder_attention_mask])

That's why during inference it has dimension (None, 51) instead of found shape=(None, 50).

You can pad decoder_inputs_ids and decoder_attention_mask to max_len_sum-1 (instead of max_len_sum) during prediction:

#Pad sequence to max_len_sum-1 (instead of original max_len_sum).   
decoder_inputs_ids = tf.keras.preprocessing.sequence.pad_sequences([decoder_input_ids[:-1]], maxlen=
                                                max_len_sum-1, padding= 'post', truncating='post')
decoder_attention_mask = tf.keras.preprocessing.sequence.pad_sequences([decoder_attention_mask[:-1]], maxlen=
                                                max_len_sum-1, padding= 'post', truncating='post')