c#pytorch artificial-intelligence onnx onnxruntime

Pytorch model converted to Onnx Inference issue

I have converted a model, from Huggingface, to Onnx using the tools provided:

optimum-cli export onnx --model deepset/roberta-base-squad2 "roberta-base-squad2" --framework pt

The conversion completes with no errors.

I use the following code to make an inference:

        // QnA Service Configuration:
        // Site: https://huggingface.co/deepset/roberta-base-squad2
        Configuration QnAConfig = new Configuration(@"C:\Bert\dslimL\model.onnx")
        {

            ConfigurationFileName = "config.json",
            HasTokenTypeIds = false,
            IsCasedModel = false,
            MaximumNumberOfTokens = 1,
            MergesFileName = "merges.txt",
            NumberOfTokens = 5,
            Repository = "deepset/roberta-large-squad2",
            TokenizerName = TokenizerName.Tokenizer,
            VocabularyFileName = "vocab.json"
        };

        Configuration = QnAConfig;
        labelsCount = Configuration.ModelConfiguration.IdTolabel.Count;

        var sessionOptions = new SessionOptions()
        {
            ExecutionMode = ExecutionMode.ORT_PARALLEL,
            EnableCpuMemArena = true,
            EnableMemoryPattern = true,
            EnableProfiling = true,
            GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_ALL,
            InterOpNumThreads = 10
        };
        sessionOptions.AppendExecutionProvider_CPU(0);

        Session = new InferenceSession(Configuration.ModelPath, sessionOptions);

        // Set Question and Context:
        Schema.Question = sentence;
        Schema.Context = Context;

        // Process SubContext:
        Schema.Sentence = $"'question': '{sentence}',"
                        + $"   'context': '{Context}'";

        // 
        var Start = Configuration.Tokenizer.Model.TokenToId("<s>");
        var End = Configuration.Tokenizer.Model.TokenToId("</s>");
        var Pad = Configuration.Tokenizer.Model.TokenToId("<pad>");

        // 
        var result = Configuration.Tokenizer.Encode(Schema.Sentence);
        var decode = Configuration.Tokenizer.Decode(result.Ids);

        // 
        var inputArray = result.Ids.ToLongArray((long)Start, (long)End);
        var MyAttentionMask = AttentionMaskHelpers.BuildMask(20, 20, -0);

        // 
        long[] attMask = new long[inputArray.Length];
        for (int i = 0; i < inputArray.Length; i++)
            if (i < inputArray.Length * 0.5)
                attMask[i] = 1;
            else
                attMask[i] = 0;

        // 
        var tensorInputIds = TensorExtensions.ConvertToTensor(inputArray, inputArray.Length);
        var attention_Mask = TensorExtensions.ConvertToTensor(attMask, inputArray.Length);

        var inputs = new List<NamedOnnxValue>
        {

            NamedOnnxValue.CreateFromTensor("input_ids", tensorInputIds),
            NamedOnnxValue.CreateFromTensor("attention_mask", attention_Mask),
            //NamedOnnxValue.CreateFromTensor("token_type_ids", attention_Mask)
        };


        ////////////////////////////////////////////////////////////////////////////////////////
        /// # Convert the answer (tokens) back to the original text
        /// # Score: score from the model
        /// # Start: Index of the first character of the answer in the context string
        /// # End: Index of the character following the last character of the answer in the context string
        /// # Answer: Plain text of the answer
        /// See: https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/question_answering.py
        /// Run the Session, infering an Output:
        var inputMeta = Session.InputMetadata;
        var output = Session.Run(inputs);

        // Init a new Answer Id List:
        List<int> AnswerIds = new List<int>();

        // The Output Logits: new List<float>(); // 
        List<float> startLogits = output[0].AsEnumerable<float>().ToList();
        List<float> endLogits = output[1].AsEnumerable<float>().ToList();

        // Get Indexes of the Context:
        float start = startLogits.Max();
        float end = endLogits.Max();
        Schema.Score = ((start + end) / 10.0f);

        // Get Indexes of the top scores:
        Schema.StartIndex = startLogits.IndexOf(start);
        Schema.EndIndex = endLogits.IndexOf(end);

        // Tokenise the Sentence:
        TokenizerResult Tokens = Configuration.Tokenizer.Encode(Schema.Sentence);

        // Get the List of Ids:
        for (int i = Schema.StartIndex; i <= Schema.EndIndex; i++)
            AnswerIds.Add(Convert.ToInt32(inputArray[i]));

        // Get the Answer:
        Schema.Answer = Configuration.Tokenizer.Decode(AnswerIds).Trim();

I am using:

 using Microsoft.ML.Tokenizers;

The Tokenizer works:

Tokenizer = new Tokenizer(new EnglishRoberta(vocabFilePath, mergeFilePath, dictPath), RobertaPreTokenizer.Instance);

I have checked the Tokens and the ID's and they match, so the Tokenization process is good.

PROBLEM: All Predictions result in a Zero Index for the Start and End Indexes which give me a start token result, so if the start token is '' then the result is '' even if the score is good or bad.

The Model works fine in Python using the PY Scripts, no errors!

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "deepset/roberta-base-squad2"

model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
'question': 'Why is model conversion important?',
'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(QA_input)

print(res)

I believe this issue is related directly to Onnx and the Onnxruntime environment, unless I am doing something wrong. However, I have other models working with similar code.

I have tried other models:

deepset/roberta-base-squad2
deepset/roberta-large-squad2

Same issues with all models.

I believe the onnxruntime is more buggy than most would like to admit, a real shame!

Solution

My advice: Don't waste your time with Onnx!

Spend the time, learn Python and use Flask for IIS Web API Modules, instead of wasting your time with onnx, because it is just too buggy and unreliable!

There are plenty of good examples to learn from!

from flask import Flask, jsonify, request  
# import objects from the Flask model
from keras.models import load_model
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
app = Flask(__name__)  # define app using Flask
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer)

Don't waste your time with onnx!