I am already installed BERT, But I don't know how to get Non-contextual word embeddings.
For example:
input: 'Apple'
output: [1,2,23,2,13,...] #embedding of 'Apple'
How can i get these word embeddings?
Thank you.
I search some method, but no blogs have written the way.
Sloved.
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
# get the word embedding from BERT
def get_word_embedding(word:str):
input_ids = torch.tensor(tokenizer.encode(word)).unsqueeze(0) # Batch size 1
# print(input_ids)
outputs = model(input_ids)
last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
# output[0] is token vector
# output[1] is the mean pooling of all hidden states
return last_hidden_states[0][1]