Model size is smaller in .onnx format than in .tflite format

I have a pre-trained PyTorch model that I want to convert to TFlite. The model is from the seisbench API. I have used the code below for the conversion. The code has some checks to confirm that the various format conversions worked.

I have followed the flow .pt -> .onnx -> tensorflow -> tflite, but I obtain an .onnx file which is smaller (98 kB) than the final tflite model (108 kB). I am using the onnx-tensorflow library to convert the .onnx file to tensorflow (https://github.com/onnx/onnx-tensorflow)

model = sbm.PhaseNet.from_pretrained("instance") #load the model from the seisbench api

#model.load_state_dict(pNET.state_dict())

print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

# Save model information

print(model.get_model_args())

input_lenght = model.in_samples
input_depth = model.in_channels

# save to .pt
model.eval() #turn off gradient computations and other training-only operations

torch.save(model, 'pNET.pt') 

# check if the model has been saved correctly
temp_model = torch.load('pNET.pt')
temp_model.eval()

print("Model's state_dict:")
for param_tensor in temp_model.state_dict():
    print(param_tensor, "\t", temp_model.state_dict()[param_tensor].size())


# save to .onnx

# define an input vector (random vector)
sample_input = torch.randn(1, input_depth, input_lenght, requires_grad=True) #order is width, depth, lenght of input
#width fixed to 1 for time series data

# export

torch.onnx.export(
    model,                  # PyTorch Model
    sample_input,           # Input tensor
    'pNET.onnx',            # Output file name
    input_names=['input'],  # Input tensor name (arbitrary)
    output_names=['output'] # Output tensor name (arbitrary)
)

# check if the model has been saved correctly
onnx_model = onnx.load('pNET.onnx')

# Check that the IR is well formed
onnx.checker.check_model(onnx_model)

# Print a Human readable representation of the graph
onnx.helper.printable_graph(onnx_model.graph)

# Try to run an inference with the newly saved onnx model

import onnxruntime as ort
import numpy as np

ort_session = ort.InferenceSession('pNET.onnx')

outputs = ort_session.run(
    None,
    {'input': np.random.randn(1, input_depth, input_lenght).astype(np.float32)} #random input
)

print(outputs) #check if you get a tensor of the right shape
print(output_data.shape)

from onnx_tf.backend import prepare
# Converting to TensorFlow model
onnx_model = onnx.load("pNET.onnx")  # load onnx model
tf_rep = prepare(onnx_model)  # prepare tf representation
tf_rep.export_graph("pNET")  # export the model

# Check if the conversion worked 

# Run a TF inference

import tensorflow as tf

model = tf.saved_model.load("./pNET")
model.trainable = False

input_tensor = tf.random.uniform([1, input_depth, input_lenght])
out = model(**{'input': input_tensor})
print(out) #check if you get a tensor of the right shape
print(output_data.shape)

# float16 quantization

converter = tf.lite.TFLiteConverter.from_saved_model("./pNET")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_quant_model = converter.convert()

# Save the model
with open('pNETlite16float.tflite', 'wb') as f:
    f.write(tflite_model) # same size as when I use interpreter instead of converter?

My confusion stems from the fact that I was expecting post-training quantization to reduce model size. Does TFLite add some extra wrappers or methods to a model, increasing the size compared to .onnx?

Solution

We now support an official and direct conversion from PyTorch to TF lite. You can give that a try: https://github.com/google-ai-edge/ai-edge-torch