python tensorflow tensorflow-lite quantization

Check quantization status of model

I have a Keras (not tf.keras) model which I quantized (post-training) to run it on an embedded device.

To convert the model to a quantized tflite model, I tried different approaches and ended with around five versions of quantized models. They all have slightly different size but they all seem to work on my x86 machine. All models show different inference timings.

Now, I would like to check how the models are actually quantized (fully, only weights,... ) as the embedded solution only takes a fully quantized model. And I want to see more details, e.g., what are the differences in weights (maybe explaining the different model size). the model summary does not give any insights.

Can you give me a tip on how to go about it?
Does anyone know if the tflite conversion with the TF1.x version is always fully quantized?

Thanks

More explanation:

The models should be fully quantized, as I used

converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

during conversion. However, I had to use the TF1.x version to transform, or respectively tf.compat.v1.lite.TFLiteConverter.from_keras_model_file with TF2.x. so I am not sure about the output model using the "classic" TF1.x version or the tf.compat.v1. version.

The way different models were created

Using TF1.3 converting a h5 model
using TF1.5.3 converting a h5 model
using TF2.2 converting a h5 model
converting h5 model to pb with TF1.3
converting h5 model to pb with TF1.5
converting h5 model to pb with TF2.2
using TF1.5.3 converting the converted pb models
using TF2.2 converting the converted pb models

Solution

Netron is a handy tool for visualizing networks. You can choose individual layers and see the types and values of weights, biases, inputs and outputs.