Number of parameters and FLOPS in ONNX and TensorRT model

Does number of parameters and FLOPS (float operations per second) change when convert a model from PyTorch to ONNX or TensorRT format?

Solution

I don't think Anvar's post answered OP's question thoroughly so I did a little bit of research. Some general info before the answers to the questions as I believe OP hasn't understood fully what TensorRT and ONNX optimizations happen during the conversion from PyTorch format.

Both conversions, Pytorch to ONNX and ONNX to TensorRT increase the performance of the model by using several different optimizations. The tools actually print you information about what they do if you choose the verbose flag for them.

The preferred way to convert a Pytorch model to TensorRT is to use Torch-TensorRT as explained here.

TensorRT fuses layers and tensors in the model graph, it then uses a large kernel library to select implementations that perform best on the target GPU.

ONNX runtime offers mostly graph optimizations such as graph simplifications and node fusions to improve performance.

1. Does the number of parameters change when converting a PyTorch model to ONNX or TensorRT?

No: even though the layers are fused the number of parameters does not decrease unless there are some redundant branches in the model.

I tested this by downloading the yolov5s.onnx model here. The original model has 7.2M parameters according to the repository authors. Then I used this tool to count the number of parameters in the yolov5.onnx model and got 7225917 as a result. Thus, onnx conversion did not reduce the amount of parameters.

I was not able to get as elaborate information for TensorRT model but you can get layer information using trtexec. There is a recent question about this but there are no answers yet.

2. Does the number of FLOPS change when converting a PyTorch model to ONNX or TensorRT?

According to this post, no.