I use the following code to generate a quantized tflite model
import tensorflow as tf
def representative_dataset_gen():
for _ in range(num_calibration_steps):
# Get sample input data as a numpy array in a method of your choosing.
yield [input]
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()
But according to post training quantization:
The resulting model will be fully quantized but still take float input and output for convenience.
To compile tflite model for Google Coral Edge TPU I need quantized input and output as well.
In the model, I see that the first network layer converts float input to input_uint8
and the last layer converts output_uint8
to the float output.
How do I edit tflite model to get rid of the first and last float layers?
I know that I could set input and output type to uint8 during conversion, but this is not compatible with any optimizations. The only available option then is to use fake quantization which results in a bad model.
You can avoid the float to int8 and int8 to float "quant/dequant" op by setting inference_input_type and inference_output_type (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/python/lite.py#L460-L476) to int8.