I have a Keras (not tf.keras) model which I quantized (post-training) to run it on an embedded device.
To convert the model to a quantized tflite model, I tried different approaches and ended with around five versions of quantized models. They all have slightly different size but they all seem to work on my x86 machine. All models show different inference timings.
Now, I would like to check how the models are actually quantized (fully, only weights,... ) as the embedded solution only takes a fully quantized model. And I want to see more details, e.g., what are the differences in weights (maybe explaining the different model size). the model summary does not give any insights.
Thanks
More explanation:
The models should be fully quantized, as I used
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
during conversion. However, I had to use the TF1.x version to transform, or respectively tf.compat.v1.lite.TFLiteConverter.from_keras_model_file
with TF2.x. so I am not sure about the output model using the "classic" TF1.x version or the tf.compat.v1. version.
The way different models were created
Using TF1.3 converting a h5 model
using TF1.5.3 converting a h5 model
using TF2.2 converting a h5 model
converting h5 model to pb with TF1.3
converting h5 model to pb with TF1.5
converting h5 model to pb with TF2.2
using TF1.5.3 converting the converted pb models
using TF2.2 converting the converted pb models
Netron is a handy tool for visualizing networks. You can choose individual layers and see the types and values of weights, biases, inputs and outputs.