python tensorflow machine-learning neural-network bert-language-model

Sequence labeling with BERT for words position

If I have a set of sentences and in these sentences there are some dependencies between words. I want to train BERT to predict which words have dependencies with others.

Example, If I have this sentence:

We were moving around in Paris, which is the capital of France.

0------1-------2-------3------4----5------6-----7---8-----9----10---11 (words indices)

I want BERT to predict, for word Paris, the position of France. So, to shape the task as sequence labeling task.

where the label for a word could be -1, if there is no relation between this word and any other words in the sentence, or the index of the other word; for our example above, Paris word should have 11 as the index of word France.

Is it a right way to place the indices as labels?

Solution

No. The problem is that n every sentence the position index has an entirely different meaning, so it would be extremely for the network to learn what to do. You can imagine the parameter matrix in the final projection as embeddings of the target classes and the classification as measuring similarity of the output state from the class embeddings.

I suggest doing the classification similarly to what people sometimes do in dependency parsers, i.e., for each pair of words, classify if there is a relation between the words or not.

BERT gives you a matrix with contextual embeddings for each sentence. Create a 3D tensor out of it, where position [i, j] contains a concatenation of representation of words i and j. Then, classify each of these pairs as true/false, telling if the is a dependency link between these two words or not.