Search code examples
pythonpytorchonnxquantizationonnxruntime

Converting PyTorch to ONNX model increases file size for ALBert


Goal: Use this Notebook to perform quantisation on albert-base-v2 model.

Kernel: conda_pytorch_p36.


Outputs in Sections 1.2 & 2.2 show that:

  • converting vanilla BERT from PyTorch to ONNX stays the same size, 417.6 MB.
  • Quantization models are smaller than vanilla BERT, PyTorch 173.0 MB and ONNX 104.8 MB.

However, when running ALBert:

  • PyTorch and ONNX model sizes are different.
  • Quantized model sizes are bigger than vanilla.

I think this is the reason for poorer model performance of both Quantization methods of ALBert, compared to vanilla ALBert.

PyTorch:

Size (MB): 44.58906650543213
Size (MB): 22.373255729675293

ONNX:

ONNX full precision model size (MB): 341.64233207702637
ONNX quantized model size (MB): 85.53886985778809

Why might exporting ALBert from PyTorch to ONNX increase model size, but not for BERT?

Please let me know if there's anything else I can add to post.


Solution

  • Explanation

    ALBert model has shared weights among layers. torch.onnx.export outputs the weights to different tensors, which causes the model size to grow larger.

    A number of Git Issues have been marked Solved regarding this phenomena.

    The most common solution is to remove shared weights, that is to remove tensor arrays that contain the exact same values.


    Solutions

    Section "Removing shared weights" in onnx_remove_shared_weights.ipynb.

    Pseudo-code:

    from onnxruntime.transformers.onnx_model import OnnxModel
    model=onnx.load(path)
    onnx_model=OnnxModel(model)
    count = len(model.graph.initializer)
    same = [-1] * count
    for i in range(count - 1):
      if same[i] >= 0:
        continue
      for j in range(i+1, count):
         if has_same_value(model.graph.initializer[i], model.graph.initializer[j]):
        same[j] = i
    
    for i in range(count):
       if same[i] >= 0:
            onnx_model.replace_input_of_all_nodes(model.graph.initializer[i].name, model.graph.initializer[same[i]].name)
    
    onnx_model.update_graph()
    onnx_model.save_model_to_file(output_path)
    

    Source of both solutions