Before answering "yes, of course", let me clarify what I mean:
After BERT has been trained, and I want to use the pretrained embeddings for some other NLP task, can I once-off extract all the word-level embeddings from BERT for all the words in my dictionary, and then have a set of static key-value word-embedding pairs, from where I retrieve the embedding for let's say "bank", or will the embeddings for "bank" change depending on whether the sentence is "Trees grow on the river bank", or "I deposited money at the bank" ?
And if the latter is the case, how do I practically use the BERT embeddings for another NLP task, do I need to run every input sentence through BERT before passing it into my own model?
Essentially - do embeddings stay the same for each word / token after the model has been trained, or are they dynamically adjusted by the model weights, based on the context?
This is a great question (I had the same question but you asking it made me experiment a bit).
The answer is yes, it changes based on the context. You should not extract the embeddings and re-use them (at least for most of the problems).
I'm checking the embedding for word bank in two cases: (1) when it comes separately and when it comes with a context (river bank). The embeddings that I'm getting are different from each other (they have a cosine distance of ~0.4).
from transformers import TFBertModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = TFBertModel.from_pretrained('bert-base-uncased')
print('bank is the second word in tokenization (index=1):', tokenizer.decode([i for i in tokenizer.encode('bank')]))
print('bank is the third word in tokenization (index=2):', tokenizer.decode([i for i in tokenizer.encode('river bank')]))
###output: bank is the second word in tokenization (index=1): [CLS] bank [SEP]
###output: bank is the third word in tokenization (index=2): [CLS] river bank [SEP]
bank_bank = model(tf.constant(tokenizer.encode('bank'))[None,:])[0][0,1,:] #use the index based on the tokenizer output above
river_bank_bank = model(tf.constant(tokenizer.encode('river bank'))[None,:])[0][0,2,:] #use the index based on the tokenizer output above
are_equal = np.allclose(bank_bank, river_bank_bank)
print(are_equal)
### output: False