tensorflow neural-network quantization tensorflow-lite

How to quantize TensorFlow Lite model to 16-bit

The following code snippet creates an 8-bit quantized TF Lite model, and replacing QUANTIZED_UINT8 with FLOAT creates a 32-bit model. Is there any flag that creates a 16-bit quantized model? I've searched the TF Lite documentation but I couldn't find any documentation on the list of possible flags. Does anyone know how to do this?

~/tensorflow/bazel-bin/tensorflow/contrib/lite/toco/toco \
  --input_file=$(pwd)/model.pb \
  --input_format=TENSORFLOW_GRAPHDEF \
  --output_format=TFLITE \
  --output_file=$(pwd)/model.lite --inference_type=QUANTIZED_UINT8 \
  --input_type=QUANTIZED_UINT8 --input_arrays=conv2d_1_input \
  --default_ranges_min=0.0 --default_ranges_max=1.0 \
  --output_arrays=average_pooling2d_2/AvgPool --input_shapes=1024,32,32,2

Solution

Currently, the only quantized type that TFLite supports in 8 bits. See here: https://github.com/tensorflow/tensorflow/blob/54b62eed204fbc4e155fbf934bee9b438bb391ef/tensorflow/lite/toco/types.proto#L27

This is because, for existing quantized models, 8 bits was found sufficient, but this may change. If you have a model that needs more bits for quantization, it may be worthwhile to create a TensorFlow issue describing your use case.