I am trying to convert sentence to embedding, with the following code.
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
model = BertModel.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text = "[CLS] This is a sentence. [SEP]"
tokens = tokenizer.tokenize(text)
input_ids = torch.tensor([tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text))])
encoded_layers, pooled_output = model(input_ids, output_all_encoded_layers=False)
The code worked. However, each time I run this code, it gives a different result. encoded_layers
and pooled_output
changes every time, for the same input.
Thank you for your help!
Maybe "dropout" works while inferencing. You can try model.eval()
In addition, "transformers" is Long-Time-Support. Stop using pytorch_pretrained_bert
import torch
from transformers import BertTokenizerFast, BertModel
bert_path = "/Users/Caleb/Desktop/codes/ptms/bert-base"
tokenizer = BertTokenizerFast.from_pretrained(bert_path)
model = BertModel.from_pretrained(bert_path)
max_length = 32
test_str = "This is a sentence."
tokenized = tokenizer(test_str, max_length=max_length, padding="max_length")
input_ids = tokenized['input_ids']
input_ids = torch.unsqueeze(torch.LongTensor(input_ids), 0)
attention_mask = tokenized['attention_mask']
attention_mask = torch.unsqueeze(torch.IntTensor(attention_mask), 0)
res = model(input_ids, attention_mask=attention_mask)
print(res.last_hidden_state)