python deep-learning onnx quantization onnxruntime

onnx.load() | ALBert throws DecodeError: Error parsing message

Goal: re-develop this BERT Notebook to use textattack/albert-base-v2-MRPC.

Kernel: conda_pytorch_p36. PyTorch 1.8.1+cpu.

I convert a PyTorch / HuggingFace Transformers model to ONNX and store it. DecodeError occurs on onnx.load().

Are my ONNX files corrupted? This seems to be a common solution; but I don't know how to check for this.

ALBert Notebook and model files on Google Colab.

I've also this Git Issue, detailing debugging.

Problem isn't...

Quantisation - any Quantisation code I try, throws the same error.
Optimisation - error occurs with or without Optimisation.

Section 2.2 Quantize ONNX model:

from onnxruntime.quantization import quantize_dynamic, QuantType
import onnx

def quantize_onnx_model(onnx_model_path, quantized_model_path):    
    onnx_opt_model = onnx.load(onnx_model_path)
    quantize_dynamic(onnx_model_path,
                     quantized_model_path,
                     weight_type=QuantType.QInt8)

    logger.info(f"quantized model saved to:{quantized_model_path}")

quantize_onnx_model('albert.opt.onnx', 'albert.opt.quant.onnx')

print('ONNX full precision model size (MB):', os.path.getsize('albert.opt.onnx')/(1024*1024))
print('ONNX quantized model size (MB):', os.path.getsize("albert.opt.quant.onnx")/(1024*1024))

Traceback:

---------------------------------------------------------------------------
DecodeError                               Traceback (most recent call last)
<ipython-input-16-2d2d32b0a667> in <module>
     10     logger.info(f"quantized model saved to:{quantized_model_path}")
     11 
---> 12 quantize_onnx_model('albert.opt.onnx', 'albert.opt.quant.onnx')
     13 
     14 print('ONNX full precision model size (MB):', os.path.getsize("albert.opt.onnx")/(1024*1024))

<ipython-input-16-2d2d32b0a667> in quantize_onnx_model(onnx_model_path, quantized_model_path)
      3 
      4 def quantize_onnx_model(onnx_model_path, quantized_model_path):
----> 5     onnx_opt_model = onnx.load(onnx_model_path)
      6     quantize_dynamic(onnx_model_path,
      7                      quantized_model_path,

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnx/__init__.py in load_model(f, format, load_external_data)
    119     '''
    120     s = _load_bytes(f)
--> 121     model = load_model_from_string(s, format=format)
    122 
    123     if load_external_data:

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnx/__init__.py in load_model_from_string(s, format)
    156     Loaded in-memory ModelProto
    157     '''
--> 158     return _deserialize(s, ModelProto())
    159 
    160 

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/onnx/__init__.py in _deserialize(s, proto)
     97                          '\ntype is {}'.format(type(proto)))
     98 
---> 99     decoded = cast(Optional[int], proto.ParseFromString(s))
    100     if decoded is not None and decoded != len(s):
    101         raise google.protobuf.message.DecodeError(

DecodeError: Error parsing message

Output Files:

albert.onnx  # original save
albert.opt.onnx  # optimised version save

Solution

The problem was with updating the config variables for my new model.

Changes:

configs.output_dir = "albert-base-v2-MRPC"
configs.model_name_or_path = "albert-base-v2-MRPC"

I then came across this separate issue, where I hadn't git cloned my model properly. Question and answer detailed here.

Lastly, HuggingFace 🤗 does not have an equivalent to BertOptimizationOptions for ALBert. I had tried general PyTorch optimisers offered by torch_optimizer on the ONNX model, but it seems that they aren't compatible for ONNX models.

Feel free to comment for further clarification.