machine-learning deep-learning nlp bert-language-model

Why is a throw-away column required in Bert format?

I have recently come across Bert(Bidirectional Encoder Representations from Transformers). I saw that Bert requires a strict format for the train data. The third column needed is described as follows:

Column 3: A column of all the same letter — this is a throw-away column that you need to include because the BERT model expects it.

What is a throw-away column and why is this column needed in the dataset since it is stated that it contains the same letter?

Thank you.

Solution

BERT was pre-trained on two tasks - Masked Language Modelling & Next Sentence Prediction.

The third column as you refer to it as is used only in Next Sentence Prediction and downstream tasks that require multiple sentences such as question answering. In these cases the value of the column won't just be A or 0 for everything. Sentence 1 will be all 0 while sentence 2 will be all 1 indicating that the former is sentence A and the latter is sentence B.