python pytorch onnx quantization onnxruntime

Converting PyTorch to ONNX model increases file size for ALBert

Goal: Use this Notebook to perform quantisation on albert-base-v2 model.

Kernel: conda_pytorch_p36.

Outputs in Sections 1.2 & 2.2 show that:

converting vanilla BERT from PyTorch to ONNX stays the same size, 417.6 MB.
Quantization models are smaller than vanilla BERT, PyTorch 173.0 MB and ONNX 104.8 MB.

However, when running ALBert:

PyTorch and ONNX model sizes are different.
Quantized model sizes are bigger than vanilla.

I think this is the reason for poorer model performance of both Quantization methods of ALBert, compared to vanilla ALBert.

PyTorch:

Size (MB): 44.58906650543213
Size (MB): 22.373255729675293

ONNX:

ONNX full precision model size (MB): 341.64233207702637
ONNX quantized model size (MB): 85.53886985778809

Why might exporting ALBert from PyTorch to ONNX increase model size, but not for BERT?

Please let me know if there's anything else I can add to post.

Solution

Explanation

ALBert model has shared weights among layers. torch.onnx.export outputs the weights to different tensors, which causes the model size to grow larger.

A number of Git Issues have been marked Solved regarding this phenomena.

The most common solution is to remove shared weights, that is to remove tensor arrays that contain the exact same values.

Solutions

Section "Removing shared weights" in onnx_remove_shared_weights.ipynb.

Pseudo-code:

from onnxruntime.transformers.onnx_model import OnnxModel
model=onnx.load(path)
onnx_model=OnnxModel(model)
count = len(model.graph.initializer)
same = [-1] * count
for i in range(count - 1):
  if same[i] >= 0:
    continue
  for j in range(i+1, count):
     if has_same_value(model.graph.initializer[i], model.graph.initializer[j]):
    same[j] = i

for i in range(count):
   if same[i] >= 0:
        onnx_model.replace_input_of_all_nodes(model.graph.initializer[i].name, model.graph.initializer[same[i]].name)

onnx_model.update_graph()
onnx_model.save_model_to_file(output_path)

Source of both solutions