python tensorflow tensorflow-lite quantization

TensorFlow - Different bit-width quantization between layers

Does TensorFlow support different bit-width quantization between layers, or is it mandatory that the same technique is performed at the whole model?

For example, let's say I perform 16-bit quantization at n layer. Can I perform 8-bit quantization at n+1 layer?

Solution

No, as of now there is no option to define the different dtype for the different layers of a model.

As per the documentation of tf.keras.layers.Layer. This is the class from which all layers inherit.

dtype - The dtype of the layer's computations and weights (default of None means use tf.keras.backend.floatx in TensorFlow 2, or the type of the first input in TensorFlow 1).