Search code examples
pythonpytorchnlpbert-language-model

How to get Non-contextual Word Embeddings in BERT?


I am already installed BERT, But I don't know how to get Non-contextual word embeddings.

For example:


input: 'Apple'
output: [1,2,23,2,13,...] #embedding of 'Apple'


How can i get these word embeddings?

Thank you.

I search some method, but no blogs have written the way.


Solution

  • Sloved.

    import torch
    from transformers import AutoTokenizer, AutoModel
    tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    
    model = AutoModel.from_pretrained("bert-base-uncased")
    
    # get the word embedding from BERT
    def get_word_embedding(word:str):
        input_ids = torch.tensor(tokenizer.encode(word)).unsqueeze(0)  # Batch size 1
        # print(input_ids)
        outputs = model(input_ids)
        last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple
        # output[0] is token vector
        # output[1] is the mean pooling of all hidden states
        return last_hidden_states[0][1]