python tensorflow-lite quantization google-coral

How to quantize inputs and outputs of optimized tflite model

I use the following code to generate a quantized tflite model

import tensorflow as tf

def representative_dataset_gen():
  for _ in range(num_calibration_steps):
    # Get sample input data as a numpy array in a method of your choosing.
    yield [input]

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()

But according to post training quantization:

The resulting model will be fully quantized but still take float input and output for convenience.

To compile tflite model for Google Coral Edge TPU I need quantized input and output as well.

In the model, I see that the first network layer converts float input to input_uint8 and the last layer converts output_uint8 to the float output. How do I edit tflite model to get rid of the first and last float layers?

I know that I could set input and output type to uint8 during conversion, but this is not compatible with any optimizations. The only available option then is to use fake quantization which results in a bad model.

Solution

You can avoid the float to int8 and int8 to float "quant/dequant" op by setting inference_input_type and inference_output_type (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/python/lite.py#L460-L476) to int8.