Search code examples
pythontorchbert-language-modelembeddingflair

Combining BERT and other types of embeddings


The flair model can give a representation of any word (it can handle the OOV problem), while the BERT model splits the unknown word into several sub-words.

For example, the word "hjik" will have one vector represented in flair, while in BERT it will be divided into several words (because it's OOV) and therefore we will have several vectors for each sub word. So from flair we'll have one vector while from BERT we might have two or more vectors.

The question here is how did the flairNLP library handle this issue?

NOTE:If you have no idea, can you at least suggest me a proper way to handle it?


Solution

  • The TransformerWordEmbeddings class has default handling for words split into multiple subwords which you control with the subtoken_pooling parameter (your choices are "first", "last", "first_last" and "mean"), see the info here: https://github.com/flairNLP/flair/blob/master/resources/docs/embeddings/TRANSFORMER_EMBEDDINGS.md#pooling-operation