Use BERT for feature extraction of a unique word

I am using BERT for feature extraction of a word given the text where it appears, but it seems current implementation in bert's official github (https://github.com/google-research/bert) can only compute the features of all the words in text, which makes it consume too much resources. Is it possible to adapt it for this purporse? Thanks!!

Solution

BERT is not a context-free transformer, which means that you don't want to use it for a single word as you would use word2vec. It's kind of the point really -- you want to contextualise your input. I mean you can have a one-word sentence input but then why not just use word2vec.

Here's what the README says:

Pre-trained representations can also either be context-free or contextual, and contextual representations can further be unidirectional or bidirectional. Context-free models such as word2vec or GloVe generate a single "word embedding" representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank. Contextual models instead generate a representation of each word that is based on the other words in the sentence.

Hope that makes sense :-)