I am trying to use Huggingface Bert model using onnx runtime. I have used the the docs to convert the model and I am trying to run inference. My inference code is:
from transformers import BertTokenizer, BertModel, BertTokenizerFast
import onnxruntime
sess = onnxruntime.InferenceSession("onnx/bert-base-cased/model.onnx")
tokenizer = BertTokenizerFast.from_pretrained('bert-base-cased')
encoded_input = tokenizer(text, return_tensors='pt', padding='max_length')
output = sess.run([i.name for i in sess.get_outputs()], dict(encoded_input)) # or sess.run(None, input_dict)
I am getting the following error:
Traceback (most recent call last):
File "/home/srg/glib-repos/invoice_locality_extraction/cloud_run_functions/name_extraction/main.py", line 94, in invoice_extractor
inference_results = infer.infer(v)
File "/home/srg/glib-repos/invoice_locality_extraction/cloud_run_functions/name_extraction/infer.py", line 111, in infer
emb, call = process(tokenizer, model, item_text_results[i:i+batch_size], call+1)
File "/home/srg/glib-repos/invoice_locality_extraction/cloud_run_functions/name_extraction/get_embeddings.py", line 50, in process
output = model.run([i.name for i in model.get_outputs()], input_dict)
File "/home/sajan/pdf2words-env/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 192, in run
return self._sess.run(output_names, input_feed, run_options)
RuntimeError: Input must be a list of dictionaries or a single numpy array for input 'attention_mask'.
According to the docs the return_tensors='np' not return_tensors='pt'